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Partial discharge is an electric discharge that does not completely bridge the 
insulation between the electrodes. It gives rise to electric pulses having magnitude (q) and 
phase position (({)) with respect to tire applied voltage waveform. It has been recognized that 
tlie breakdown of insulation of an electric equipment is often due to the occurrence of 
partial discharge within or on tlie surface of the insulation. Therefore, if partial discharge is 
found to occur in any insulation system, it is important to identify its source. The most 
important step in the classification process is to get the exact finger prints which could 
represent the different partial discharge sources successfully. During the past decade, 
partial discharge finger prints have been commonly formed by phase resolved pulse height 
analysis using the (q-<j)-n) distributions, where n is the repetition rate of partial discharge 
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pulses Conventionally, classification of partial discharge sources has been done with the 
help of the shape of these distributions. The investigations in this thesis have been 
motivated by the fact that these distributions have the followmg drawbacks' 

(a) They suffer from the averaging effect 

(b) These distributions do not take the memoiy propagation between the partial 
discharge pulses into consideration. 

Texture analysis algorithms have been successfully applied m the field of pattern 
recognition especially in the image processing applications, where the texture features 
contam information about the spatial distribution of spectral variation Generally, an image 
is divided horizontally and vertically mto number of pixels. Each pixel has different gray 
level Texture analysis is used to study the gray level variation of an image in diffeient 
directions (horizontal, vertical, left diagonal and right diagonal). To apply these algontlims 
to partial discharge source classification, if measurement is made through (say M) power 
frequency cycles and each cycle is divided into number of windows (say jV), an image of 
MxN pixels is obtained In this study, the gray level has been replaced by tlie magnitude of 
partial discharge pulses so that tire texture features can be used to give an overview of 
every single partial discharge pulse of the whole measurements. Investigations in tliis thesis 
have been conducted in the horizontal and vertical directions only in order to study the 
relationship between each pulse and the adjacent pulses m the same cycle as well as tire 
relationship between a pulse and the other pulses in the same phase angle in different 
cycles. 



This thesis has made an attempt to understand some of the issues m obtaining 
reliable features from texture analysis algonthms and evaluate their ability to classify 
different partial discharge sources The mam objectives behmd the research work carried 
out in this thesis have been to' 

(a) compare the discriminatmg power of different texture analysis algorithms for 
the partial discharge source classification, including a mmimum distance 
classifier, transformed divergence analysis and artificial neural netwoik based 
methods. 

(b) compare the discrimmating power of the texture analysis algorithms with the 
discriminating power of the conventional (q-(j)-n) distributions method for 
partial discharge source classification 

(c) determine the relative classification accuracy of each feature individually. 

(d) determine the mmimum number of features used at a time to achieve a good 
classification accuracy. 

(e) investigate the effect of changing number of cycles on the classification 
accuracy. 

(f) determine the best feature selection technique in view of minimizing the 
number of features required to classify different partial discharge sources 
accurately. 

(g) determine which direction (horizontal or vertical) has more discrimmating 
power to minimize the computational time. 
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The work reported in this thesis has been organized into nine chapters: 

Chapter 1 introduces the partial discharge phenomena, reviews the conventional methods 
for partial discharge classification, and briefly outlines the work carried out in this thesis. 

Chapter 2 presents theoretical background of four different texture analysis algorithms. 
These algorithms include the spatial gray level dependence method (SGLDM), tire gray 
level difference histogram method (GLDHM), the gray level run length method (GLRLM), 
and the power spectral method (PSM), respectively. 

Chapter 3 describes the experimental set up used for partial discharge measurements and 
the samples used to simulate the partial discharge sources. Tire partial discharge sources 
created in the laboratory include glow corona, streamer corona, surface discharge, internal 
discharge, single protrusion and multi protrusions. 

Chapter 4 describe the application of the minimum distance classifier for partial discharge 
classification using all the features based on the texture analysis algorithms. The 
classification accuracy of each algorithm has been investigated and tlie classification 
accuracy of each feature individually as well as for the combination of two features at a 
time have also been studied. 

Chapter 5 presents the application of a direct feature selection technique known as the 
‘transfomied divergence analysis’ to measure the distance between the classes in the 
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feature space. The optimum features have been selected according to the maximum 
separation between the different partial discharge sources. 

Chapter 6 investigates the use of a back propagation artificial neural network (ANN) for 
partial discharge classification. A three layers feed forward ANN has been developed for 
this purpose. 

In chapter 7, the GLDHM features have been used to distinguish two partial discharge 
sources generated from a high voltage cable as a practical example. A belted cable has been 
used to generate internal discharge between the insulating paper and also the surface 
discharge at the terminals of the cable. 

In chapter 8, an indirect method known as ‘principal component transformation’ has been 
used to reduce the number of features by mapping the original set of features into a new set 
of features with less number in which the separation between the classes (partial discharge 
sources) are maximum. Classification accuracy of tire main principal components has been 
detennined for different partial discharge sources. 

Chapter 9 presents summary of main conclusions of the thesis and also includes few 
recommendation for further work. 


The work carried out in this thesis reveals the following: 
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1. Texture analysis algorithms can be effectively used to generate different 
features which are capable of distinguishing between different partial discharge 
sources. 

2. Amongst the four texture analysis algorithms, GLDHM has a classification 
accuracy at par with the other algorithms and is computationally faster. 
Therefore, it is recommended to be used for partial discharge source 
classification. 

3. With the use of minimum distance classifier and also the transformed 
divergence analysis for each of the algorithms, two features used at a time are 
found sufficient to achieve a considerable classification accuracy. However, the 
best combinations depend upon the number of cycles used to constmct the 
patterns and also on the partial discharge sources used for classification. 

4. The classification accuracy achieved with the ANN is less than that achieved 
with the minimum distance classifier. 

5. Using the principal component transformation, two principal components are 
found to be sufficient to achieve a desired classification accuracy which is 
found to be independent of the number of cycles used to construct the partial 
discharge patterns and also the partial discharge sources used for classification. 
Hence, the principal component transformation is recommended as a teclinique 
to reduce the number of features for the partial discharge source classification. 
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Chapter 1 
Introduction 


1.1 General Introduction 

Electric discharges which do not bridge the electrodes are called partial discharge [Kreuger, 
1989]. In other words, it is a local breakdown in a dielectric when it 105 .'^ its insulating 
property locally and not globally. It may originate in the dielectric directly at one of tire 
electrodes or occur in a cavity. The partial discharge phenomenon results in very short 
duration electric pulses. Although the magnitude of such pulses is small, they can cause 
progressive deterioration and ultimate failure to the insulation. Therefore, it is essential to 
detect their presence in order to ensure reliable operation of high voltage equipment. 
Furthemrore, a meaningful interpretation of partial discharge measurement is required in 
order to identify its source. Several methods are available for partial discharge detection 
and classification. This chapter introduces some of the existing approaches for partial 
discharge detection and classification along with review of some of the literature on the 
topic and set the motivation behind the present work. 

1.2 Types of Partial Discharge 

Partial discharges are categorised into the following three groups[Kreuger, 1989]: 

(1) Corona or gas discharge: Partial discharge at a free electrode in gaseous dielectric is 
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known as corona. Depending upon the electrode shape and the gap distance, these can 
be of three types namely glow, streamer and leader corona. 

(2) Surface discharge, also knowing as tracking, which occurs on the interface of solid or 
liquid dielectric materials with gaseous dielectrics. Surface discharge may occur if there 
exists an electric stress component in the direction of the dielectric surface like in 
bushings, insulators and ends of cables. 

(3) Internal discharge: Internal discharges in a dielectric occur due to foreign particle 
inclusions during manufacturing or operation as well as in cavities. Cavities could be 
filled with gas or oil. The dielectric strength of the material in the cavities breaks down 
at a stress which is much low'er as compared to the breakdown strength of the 
surrounding dielectric. The dielectric strength of the gas filled cavities depends on ; 

(i) The kind of gas in the cavity 

(ii) The gas pressure 

(iii) The shape and direction of the cavity. 

Oil filled cavities occur between layers of oil impregnated paper insulation such as in 
transformer winding and in cables. The breakdown strengtli of oil strongly depends upon 
contamination and the amount of the dissolved gases. If the oil breaks down, gas bubbles 
are produced and gas discharge may occur. 
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1.3 Recurrence of Discharge 

When the voltage across a cavity reaches the breakdown value, the cavity may break down. 
Tlie voltage breakdown takes place in less than 10 psec [Kreuger, 1989]. This is extremely 
a short time compared to the duration of a 50 Hz and hence the voltage drop may 

be regarded as a step function. The breakdown of the cavity is determined by the 
superposition of the main electric field and the field of the surface charge at the cavity 
walls left behind after the last discharge. The voltage across the cavity starts again 
increasing until it reaches the breakdown voltage, when a new discharge occurs. Thus 
several discharges may take place during the rising part of the applied voltage. Similarly, 
on the decreasing part of the applied voltage the cavity discharges occur as the voltage 
across the cavity reaches the negative value of the breakdown value. In this way, groups of 
discharges originate from a cavity and gives rise to positive and negative currerrt pulses on 
raising and decreasing the voltage, respectively, as shown in Fig. 1.1. 



Fig. 1.1: Partial discharge pulses witlr respect to the applied voltage 
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1.4 Detection of Discharge 

The detection of discharges is based on energy exchanges which take place during the 
discharge. Electrical discharge is transitory disturbance which radiates electromagnetic, 
acoustic and thermal energy from the discharge site. Therefore, these exchanges are 
manifested as electrical impulse current, dielectric loss, chemical transformation, heat, gas 
pressure, sound and light [Kuffel et al, 1984]. Generally, there are two types of partial 
discharge detection methods; non electrical and electrical methods. 

1.4.1 Non-electrical detection methods 

Both insulating materials and lubricating oils are complex organic materials which, when 
degraded by heat or electric action, produce a very large number of chemical products in 
the gas, liquid and solid state [Tavner et al, 1987]. Electrical discharge activity within or 
adjacent to the insulatmg system also release chemical degradation products. It breaks 
oxygen to give ozone. Furthermore, continuos partial discharge activity gradually carbonise 
the insulating materials to produce, on a smaller scale, the degradation products which, for 
example, are caused by local overheating. 

Sound is a longitudinal mechanical wave motion in an elastic medium and is classified 
according to its frequency as [Tuma, 1976]: 

(i) audible by human ear (frequency between 20-20000 Hz) 

(ii) infrasonic (below the response of the human ear) 

(iii) ultrasonic ( above the response of human ear) 
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Sound detection could be audible or ultrasonic. Usually a narrow band of about 30 to 50 
kHz is chosen (just above the audible spectrum). Several ultrasonic detection systems are 
available commercially. The signals are converted to audible sound and their magnitude 
can be read from a decibel meter. Ultrasonic techniques are useful for the detection and 
location of partial discharge sources, especially for oil filled transfomiers. Investigation of 
ultrasonic frequency spectrum from 0.15 to 1.75 MHz has revealed that the ultrasonic 
spectrum signature of some partial discharge sources, like voids and spark gaps, are 
different and thus can aid in partial discharge identification [Harrold, 1975]. 

Light detection can only be used for surface discharge and corona. The sensitivity of light 
detection can be improved appreciably by using a photomultiplier [Kreuger et al, 1988]. 
However, it shares the disadvantage with other non electrical methods, that the discharge 
magnitude can not be measured. The non electrical detection methods are not used 
commonly because they are less sensitive than the electrical ones. 

1.4.2 Electrical detection methods 

The primary characteristics which partial discharge detectors have in common and, 
therefore, could provide a basis for evaluation of the detector are the number of inputs 
employed, the bandwidth of the detector and the method of display processing [Steiner, 
1991]. The three basic types of displays are: 

(i) The meter display 

(ii) The direct display 

(iii) Tire computer driven display. 

The meter methods are implemented using either a digital panel meter or an analog meter 
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and display a number related to some parameter of the partial discharge process. Also, 
more than one meter can be used as an auxiliary display device so that several parameters 
can be monitored simultaneously. The main disadvantage of meter display is that it is not 
able to distinguish between a single discharge of highest magnitude and the sum of 
multiple discharges of smaller magnitudes. The long time-constant of the voltmeter, for 
example, causes it to behave somewhat like an integrator providing only measure of an 
average value[Steiner, 1991]. Direct display is the best choice. The discharge impulses 
usually are displayed on a time base of the same frequency as the applied voltage. 
Recurrent discharge in successive cycles cover each other and a stationary picture is 
obtained. From the pattern of a discharge on the screen of tire oscilloscope, the source of 
that discharge can be detennined. Computer driven display requires that the detected partial 
discharge pulses be digitised either into a set of values or wavefomis. The most common 
partial discharge electrical detectors are: 

1.4.2.1 Schering bridge 

The occurrence of partial discharge in electric equipment was already recognised by 
Peterson at the beginning of this century in 1912. Its presence has been normally detected 
either visually or audibly at sufficient high intensity. Following the development of 
Schering bridge in the 1920’s, it became possible to record or detect discharge in terms of 
its influence upon the dissipation factor (tan 5) value. The power loss P in a dielectric can 
be expressed by 

P = a)CV-tmS 

where cd is the frequency, C the specimen capacitance, V the applied voltage and tan 5 is 
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the dissipation factor. It was noted that a plot of log(P) versus log(V) is straight line having 
a slope equal to two so long as the tan 5 value remains approximately constant with the 
applied voltage [Bartnikas, 1987]. Following PD inception with increasing voltage, the 
iacrease in tan 5 with voltage becomes perceptible, and consequently the slope of the line 
exceeds the value of 2. Hence, after partial discharge inception, the plot will consists of 
two straight lines whose intersection will determine the partial discharge inception voltage. 
Schering bridge is a narrow band detector and normally implemented as a two input 
detector. Since the bandwidth is small, an integrated response is provided. Therefore, 
normal background partial discharge masks the presence of larger pulses. It is not accurate 
due to the following reasons: 

1- Not every increase in dielectric losses coincides with discharges. 

1 - Start of increase in tan 5 is often difficult to determine. 

2- The amount of losses due to partial discharge is quite negligible as compared to the 
other power losses in the dielectric. 

1.4.2.2 Radio interference voltage 

The Radio Interference Voltage (RIV) is one of the oldest partial discharge measurement 
method. This measurement technique finds its origin in Electromagnetic Interference 
Measurements (EMI) and was not designed but adopted for the partial discharge 
measurement. The RIV method is still common in the transformer industry and also used 
for measuring partial discharge in bushing and insulators [Steiner, 1991]. RIV is a narrow 
band single input detector. Therefore, its long time- constant causes it to behave somewhat 
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like an integrator providing only a measurement for the average value of the partial 
discharge magnitude. The main disadvantage of RTV is that no general relationship exists 
between the measured voltage and the average partial discharge level in the equipment. 

1.4.2.3 Single channel analyser 

The partial discharge detecting techniques described earlier can be classified as (go-no go) 
methods in the sense that they only indicate the presence or absence of discharge above 
certain levels in pico Coulomb (pC) or micro Volt (jiV) in the insulating system undergomg 
test. Additional quantitative information on the partial discharge process concerning the 
partial discharge pulse pattern density was obtained when measurements were made for the 
discharge rate. This could be done by using electronic counters to count partial discharge 
pulse amplitudes above certain pre-set voltage or by the additional use of discriminator to 
count pulses within only certain level. This method is connmonly known as single channel 
analyser [Bartnikas et al, 1969]. With the help of the single channel analyser, 
measurements are carried out for the discharge rate as a function of the partial discharge 
magnitude. This could be done by either varying the levels of the upper and lower 
discriminators or by making use of a limited number of single channel units with different 
but fixed level settings. By changing the level of the upper and lower discriminator, the 
single channel analyser proved to be susceptible to error due to time dependent changes in 
the partial discharge patterns itself, whereas the use of limited number of single channel 
analysers was characterised by extremely poor pulse height resolution characteristics. 
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1.4.2.4 Multi-channel analyser 

With the availability of multi-channel pulse height analyser, the above difficulties could be 
resolved. A multi-channel pulse height analyser provides essentially a distribution of the 
frequency of occurrence of the partial discharge pulse heights of a train of pulses associated 
with a given partial discharge pattern [Bartnikas, 1973a; Bartnikas, 1973b]. Commercially 
available pulse-height analysers may have a typical options of 128, 256, 512, 1024, 2048, 
4096, 8192 charmels coupled with the required memory and capability of digitising the 
input pulse data. For example, in 1 024 channel pulse height analyser, the incoming partial 
discharge pulses of various amplitudes are sorted by an analog to digital converter into one 
of 1024 possible heights. Thus, multi-channel analyser provides a statistical distribution 
characteristics in which individual channels 1, 2, 3,.., 1024 correspond to particular pulse 
charges. The total number of partial discharge pulses counted in each channel equals the 
number of discrete pulses whose peak amplitude corresponds to the particular channel. The 
count time interval over which each partial discharge pulse train is analysed can be made 
sufficiently long to obtain a truly statistical amplitude distribution of the pulse train. Each 
channel provides a point reading and a curve drawn through the points represents a plot of 
the number of partial discharge pulses versus the apparent charge. RC pulse shaping circuit 
can be employed to provide the multi-channel pulse height analyser with smooth 
unidirectional pulses. The horizontal axis of tlie multi-channel analyser has to be calibrated 
in fiV or pC i.e. each channel corresponds to a certain voltage tlueshold or level. Prior to 
each measurement the over all amplification of the corona detection circuit must be 
suitably adjusted to fix the upper and lower amplitude spectrum limits between which a 
particular partial discharge pulse train is to be analysed. By using a multi-channel pulse 
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height analyser one could get a family of partial discharge pulse height distributions as a 
function of applied voltage or time. 

However, it should be noted that the multi-channel analyser performs analysis of unipolar 
pulses only. Therefore, pulse amplitude distributions for positive and negative discharges 
can only be obtained by two distinct measurements carried out at different times. 
Moreover, it does not make efficient use of the available information since it filters the data 
received and does not provide a complete permanent record of all partial discharge events 
during the time of observation. On the other hand, multi-channel analyser is important in 
DC testing because these pulses are difficult to observe if there is no means available to 
store them. DC partial discharges are difficult to observe on a direct display because the 
time separation of pulses are random and quite often these time spans are large. 

1.4.2.5 Pulse detection circuits 

Broadband pulse detection is by far the most common method for measuring partial 
discharge. The typical pulse detection circuit is implemented using a high voltage coupling 
capacitor and a measuring impedance. The coupling capacitor must be partial discharge 
free up to the maximum test voltage of the system. The measuring impedance may be 
connected to the sample in two ways: 

(i) In series with the test sample. 

(ii) In series with the coupling capacitor. 

Both the methods are electrically equal, where the same partial discharge pulses pass across 
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the measuring impedance. But, in practice, it is often placed in series with the coupling 
capacitor because of the following reasons: 

(i) The large charging current of the test sample does not pass through the 
measuring impedance. 

(ii) To protect the measuring impedance and measuring circuit from over voltage in 
case of the test sample failure. 

(iii) It is particularly useful when it is not practical to break the ground connection 
to the test sample. 

The measuring impedance used commonly are: 

(i) A resistor shunted by a parasitic capacitor (RC impedance). 

(ii) An oscillatory circuit (RLC impedance). 

The higher sensitivity of RLC type pulse detection method was recognised as early as in 
1933. It is used essentially as a resonant type of corona pulse detection circuit which was 
set into oscillation at its natural frequency by a discharge transfer within tlie test sample. 
The capacitance of the measuring circuit is small. It just represents a stray capacitance. 
Also the inductance of the measuring circuit has a low impedance at power frequency so 
that the AC excitation voltage is eliminated across the coupling capacitor. The RLC 
impedance acts as a band-pass filter and transforms the very short partial discharge pulses, 
superimposed on the power frequency component, into decaying cosine transient pulses. A 
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high resonance frequency of the measuring impedance is required to ensure high pulse 
resolution time and therefore integrated pulse having short duration, but at the same time a 
small value of the upper cut off frequency is necessary to perform a good current pulse 
integration. The lower cut off frequency is suggested to be greater than 10 kHz. The upper 
cut-off frequency ranges from 200 to 400 kHz to limit high noise interference with the 
detected signal [Capponi et al, 1992]. The high frequency noise is picked up from 
commercial AM radio transmissions which are in the frequency band of 540 to 1600 kHz. 
The factor that influences the choice of the lower cut-off frequency is the need to eliminate 
low frequency interference such as the harmonic distortion in the power source and 
switching noise. 

The direct display method is most common method for displaying broadband partial 
discharge pulses. Broadband partial discharge detection systems are widely used as a result 
of several advantages compared to narrow band teclmiques. The most important advantage 
is that the individual partial discharge pulses are measured with this method rather than an 
average value. Another important advantage is the greater sensitivity which comes with a 
wider bandwidth. With pulse detection systems, the operator can directly observe the pulse 
shape along with the pulse polarity. This is not possible with narrow band systems. The 
operator can also observe the distribution of pulses in phase, which is often useful in 
identifying tlie nature of the discharge source. These types of measurements are readily 
adapted by the computer controlled systems. 

For partial discharge measurements, following two fundamental methods could be used; 

(i) Balanced detection bridge method 

(ii) Direct measurement method. 
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In the first case, partial discharge signals are obtained as difference between the voltages 
across two resistive impedances put in two distinct branches, one containing the specimen 
under test and the other a discharge free coupling capacitor having the same frequency 
dependence of dielectric loss as the specimen. In this case, the displacement current and its 
harmonics are suppressed by common rejection mode of the bridge circuit. However, it 
should be pointed out that this method is seldom used because of some drawbacks. 
Actually this method requires the specimen to be kept above the ground and, in order to get 
the full advantage of the high value of common rejection mode ratio, a coupling capacitor 
must be available, matching the specimen as far as the capacitance and dissipation factor 
are concerned. These conditions can not always be achieved in partial discharge testing 
routines. ' 

In the direct measurement method, cuixent pulses in the leads of the sample are measured 
directly. One of the main difficulties in performing partial discharge tests is that the 
discharges which occur in other parts of the test circuit, such as in the high voltage source, 
high voltage leads, coupling capacitor etc interface the measurements. Hie impulses caused 
by these discharges are difficult to be distinguished from those discharges in the test object, 
so that they disturb the observation of the wanted discharges. The effect of these external 
discharges can partially be suppressed by using filters or by using partial discharge free 
high voltage transformer and coupling capacitor. 

1.5 Computerised Measurement and Automated Recognition of Partial Discharge 
Hie data obtained using various multi channel pulse analyser techniques can be stored and 
then analysed by means of a computer. The current trend in this regard is to substitute 
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much of multi-channel analyser hardware with computer oriented software. One of the 
advantagefof this method is the possibility of later improvement of the evaluations, since 
the primary data are on permanent record in storage media. This is in contrast to the earlier 
techniques when a new evaluation required application of high voltage again to the test 
object which might have changed with time. 

For many years, recognition of partial discharge pattern was performed by visual 
inspection, i.e. by observation of partial discharge on an oscilloscope screen. The 
interpretation of these patterns appearing on the ellipse depends upon the knowledge and 
experience of the expert. It became soon apparent to the worker in this field that the 
measurement of total or cumulative pulse count was not capable of providing a complete 
overview of the process occurring during the partial discharge. A measure of the discrete 
pulse height with their corresponding discharge rate was very much desirable. Therefore, in 
addition to the usual measurement of the partial discharge inception and extinction voltages 
as well as the maximum partial discharge pulse amplitude, it is useful to record other 
partial discharge parameters, such as (for example) the maximum pulse magnitude, the 
pulse repetition rate, the pulse separation time and the pulse relationship with respect to the 
applied sinusoidal voltage. The aim of today’s computer diagnosis system is to replace the 
human expert. However, this requires a better diagnosis reliability. For visual display the 
dimension is limited to only three parameters. Partial discharge events have a magnitude 
and a phase position. They also depend upon time and the test voltage. Generally a personal 
computer is able to handle as many quantities as necessary. Also, once the raw partial 
discharge data from the interfacing circuit has been transferred into the computer memory, 
tire infomiation can be analysed immediately and displayed as required. Tlrerefore, it is 
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useful to develop more complex distinguishing features dming the automated recognition 
of partial discharge phenomena. 

The development of computer based partial discharge measurement technique was started 
in UK [Austin et al,l916\. In their work, for the first time, the discharge information 
associated with individual partial discharge pulse was recorded digitally using a mini 
computer. Such information included the apparent discharge magnitude, its polarity, time 
of occurrence and the applied voltage. The use of computer in partial discharge 
measurements started the automation in partial discharge recognition [Gulski et al 1990, 
1992 & 1995; Gulski 1993, 1995a &1995b; Krivda 1995a]. As each defect has its own 
particular effect on the degradation of insulation, it is important to know the conelation 
between tlie discharge pattern and the kind of defect. Therefore, progress in the recogirition 
of partial discharge pattern and their correlation with the kind of defect is becoming 
increasingly important in the field of quality control of insulating systems. One of the most 
important advantages of a computer-aided measuring system is the ability to process a large 
amount of information and to transfonn it into an understandable output form. 

Many computer-aided systems have been developed for the measurement and 
understanding of partial discharge phenomena. In practice, the trend is in the direction of 
improving the recognition of discharge sources and evaluation of the measuring results 
which assist in detennining the quality and the condition of the insulation system. In this 
regard, the phase resolved pulse height analyser is the most common approach [Okamoto et 
al 1982, 1985]. In this approach, pulse height analysis, which was common with multi- 
channel analyser, is performed as a function of the excitation phase voltage. Phase resolved 
pulse height analysis has several advantages in addition to the usual benefits of the 
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broadband methods. The first advantage is that the measurements are stored in memory. 
Since other methods do not have memory, measurements are lost. After the data collection, 
further analysis can be performed such as signature analysis and various statistical 
operations. However, the information regarding the actual time of occurrence of each 
individual partial discharge pulse relative to the start of the test is lost after this conversion. 

1.6 Recognition Procedures 

In general, recognition procedures are shown in Fig.l. 2: 



Fig. 1.2 Pattern recognition procedure 

Various modules of tlie recognition procedure, shown in the above figure, are described 
below for the partial discharge source classification. 
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1.6.1 Measurements 

Partial discharge measured quantities can be divided into two main groups; 

1- Basic quantities, which are observed during single voltage cycle . 

2- Deduced quantities, which are derived from basic quantities in the first group 
observed throughout several voltage cycles. 

1.6. 1.1 Basic quantities 

It is known that by using conventional detection methods (bandwidth 400 kHz) the 
electrical activity of partial discharge is represented by tire following independent 
quantities [Kreuger et al, 1993; James et al 1995]: 

- Discharge magnitude q, 

- Instantaneous applied voltage Vi 

- The position of the discharge related to the phase angle of the test voltage (j),. 
-Time of occurrence relative to the beginning of test tj. 

- Discharge polarity pi (positive or negative). 

- Discharge energy w,. 

1.6.1.2 Deduced quantities 

For obtaining the deduced quantities, the basic quantities have to be observed for a long 
time span. These quantities can be analysed as a flinction of time as well as the phase angle 
of the applied voltage. The quantities as function of time describe the change of the basic 
quantities with respect to time. It is known that variations in partial discharge occur, both 
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in magnitude and also in temporal behaviour of the discharge. This variation is due to 
statistical variations in the discharge phenomena itself and change in the discharge site. 
Therefore, to get information on the condition of the dielectric, the time behaviour m both 
the positive and negative half cycles of inception voltage V„^(t) and the number of 
discharge Ng(t) were processed [Gulski et al, 1992]. The quantities as a function of phase 
angle represent the recurrence of partial discharge related to their phase angle. Therefore 
the voltage cycle was divided into phase windows representing the phase angle axis (0- 
360°). If the observation takes place over several voltage cycles, the following four 
quantities can be determined in each phase window [Gulski et al, 1992; Kreuger et al, 
1993]: 

- Sum of the discharge magnitudes 

- Number of discharges 

- Average value of discharges 

- Maximum value of discharge 

These quantities observed throughout tlie whole angle axis, result in distributions of 
recurrence as function of phase angle. 

1.6.2 Patterns for partial discharge recognition 

Partial discharge measurements can be performed in many ways. The measurements give 
rise to patterns. By measuring charge displacement in the leads, in tire form of cuirent 
pulses, partial discharge patterns can be observed in the form of various discharge 
distributions. These distributions can be three dimensional Hn(q,(j)) and the other two 
dimensional distributions derived jfrom it are as following: 
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- Hn(q,(j)) 

- Hqmax((j)) 

- HqmCc},) 
-Hn((l.) 
-Hn(q) 
-Hn(p) 


Number of pulses as function of magnitude and phase angle. 

The maximum pulse height distribution as function of phase angle. 
The mean pulse height distribution as function of phase angle. 

The pulse count distribution as function of phase angle. 

The number of discharge as a function of discharge magnitude. 

The number of discharges as a function of discharge energy. 


These distributions have proved to be useful for the recognition of partial discharge 
sources. It has also been observed that these distributions can significantly change during 
ageing of insulation [Krivda et al, 1994; Gulski etal, 1995]. 


1.6.3 Feature extraction for partial discharge recognition 

The aim of feature extraction is to reduce the dimensionality of the original partial 
discharge patterns by calculating certain features from the patterns. The number of features 
should be as low as possible. Lower the number of features, faster is the speed of 
classification. Most of the features are based on the three dimensional distribution Hn(q,(j)). 
The number of pulses in each window of the Hn(q,(j)) distribution, measured for certain 
partial discharge source, can be used to generate number of features equal to the number of 
windows which can be used in turn to identify its source. However, by using high 
resolution distribution, i.e. large number of phase windows and magnitude windows, a 
large number of features, have to be used for recognition. Even by usiag a low resolution, 
the number of features required is high. To reduce the number of features to a reasonable 
number, the following methods have been used in the literature. 
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1. 6.3.1 Statistical features 

All the data which have common feature that they are affected by chance should be 
analysed by statistical operators. Situations may get influenced by the presence of such 
effects which one can not predict because they result from factors that can not be controlled 
or often enumerated. Partial discharge measurements are example of the above kin d of 
data. Therefore, different statistical parameters such as skewness, kurtosis asymmetry etc. 
have been used to characterise the partial discharge distributions. 


(i) Skewness (Sk) 
It is defined as: 


( 1 . 1 ) 


where x,- is the recorded value, p, is the probability of fi-equency of appearance for that 
value X, in phase window z, p is the mean value and a is the standard deviation. It 
represents the asymmetry of the distribution as shown in Fig. 1.3 

- If the distribution is symmetric, Sk=0 

- If it is asymmetric to the left, Sk >0 

- If it is asymmetric to the right, Sk <0 



Sk<0 Sk = 0 Sk> 0 

Fig. 1.3: Distributions with different skewness 
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(ii) Kurtosis 

It represents the sharpness of the distribution. It is defined as: 


Ku=^ 


Z(^, P, 

A 


( 1 . 2 ) 


As shown in Fig. 1 .4, 

- If the distribution has the same sharpness as the normal distribution, Ku = Q 

- If it is sharper than the normal distribution, Ku>0 

- If it is flatter than the normal distribution, Ku<Q 



Ku<0 




Fig. 1 .4: Distributions with different kurtosis 


(iii) Cross-correlation factor (Cc) 

The cross correlation factor indicates the difference in shape of partial discharge pattern in 
the positive and negative half cycles. If the shapes are the sarrie (but not necessarily equal) 
Cc=l. If they differ completely, Cc=0. It is defined as, 


Cc: 




-(E^,)' 


(1.3) 
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where x, is the mean discharge magnitude in window i of positive half cycle and y, the 
mean discharge magnitude in the corresponding window in the negative half cycle; n is the 
number of phase windows per half cycle. 


(iv) Discharge factor (Q) 

The discharge factor describes the difference in the mean discharge levels in the positive 
and negative distributions. It is defined as, 


0 = ^/^ 
^ N-/ N* 


(1.4) 


where, 

Q* and Q~ zxt the sum of discharge of the mean pulse height distribution in the positive 
and negative half cycles, respectively. 

and jV are the number of discharge of the mean pulse height distribution in the positive 
and negative half cycles, respectively. 

Q=1 denotes equal discharge and Q=0 denotes the large difference 


(v) Modified cross correlation factor (MCc) 

The modified cross conelation factor is defined as the product of the discharge factor and 
the cross correlation factor. 

MCc = Q.Cc (1.5) 


(vi) Number of peaks 

It is used to distinguish between the distributions witlr single peak and distributions with 
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several peaks. 

To improve the discrimination of these features, it has been suggested that these should be 
calculated for the positive and negative half cycles, separately. For example, skevness, 
kurtosis and number of peaks can be calculated for each half cycle individually, while tlie 
cross correlation factor can be calculated to investigate the correlation between the 
distributions in the positive and negative half cycles. For example in this case, for the tloree 
distributions, Hn( 9 ), Hqm((p) and Hqmax(cp) one gets 21 features. 

1. 6.3.2 Fractal geometry 

Fractal provides a proper mathematical framework to study the irregular and complex 
shapes e.g. tree, hill, clouds, etc. as found in nature. The geometry of fractals has been used 
to extract features from the three dimensional partial discharge patterns acquired using 
phase resolved partial discharge analyser (Satish et al, 1995; Krivada, 1995a; Krivada et al, 
1995b; Meijer et al, 1998). Following two fractal features have been used: 

(i) Fractal dimension to characterise the surface roughness. 

(ii) Lacunarity to describe the surface denseness. 

The calculation of both these features depends upon the number of cubes of side I which is 
necessary to cover the fractal surface. To calculate this number, it is assumed that p(m,l) is 
the probability that there are m points within a cube of size I which is centred about a point 
on the fractal snrfac&. p(m,l) is normalised , as below, for all /. 
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= \ ( 1 . 6 ) 

m-\ 

Where, N is the number of possible points within the cube. Let S be the total number of 
surface point. If one overlays the surface with cubes of side /, then the number of cubes 
with m points inside the cube is (S/m) p(m,l). Therefore, the expected total number of cubes 
to cover the whole surface is: 


m=i m m 


(1.7) 


To calculate the fractal dimension (D), N(l) should be calculated for various values of / 
using a least square curve fitting on [log(l),log(n(l))], where D is the slope of that line. 
Lacunarity A(l) could be calculated by using the second order statistics of p(m,l) as 
following: 


A? 


^(0 = 

m=l 

(1.8) 

M~{1) = m'p{m,l) 

(1-9) 


(1.10) 


1.6.4 Classification 

The aim of classification is to assign a label to a partial discharge pattern of unknown 
origin from previously collected patterns with known labels such as treeing discharge, 
corona etc. There are number of methods available for classification. Some of these 
include: 

- Artificial neural network based classifiers (ANN) 



Introduction 


25 


- Conventional classifiers 

1.6.4.1 Artificial neural networks based classifiers 

The classification potential of an ANN can be visualised by understanding how it classifies 
the finger prints. The back-propagation network (with one hidden layer and a signiiod 
transfer function) separates data by hyperplanes (line in 2-d space, planes in 3-d space, 
etc.). The hyperplanes are generated by neurons in the hidden layer (one hyperplane per 
neuron). Weight connections between the input layer and the hidden layer determine the 
slope and shift of the hyperplanes. Weight connections between hidden layer and the output 
layer serve as logical functions which decide on which side of hyperplane a test fingerprint 
lies. Finger prints of two defects can be separated by the network with two neurons in the 
hidden layer. A testing finger print is then classified according to its position relative to 
h 3 p)erplane. Such a classification procedure can, however, cause problems. It can be seen 
that a fingerprint of unknown origin, not belonging to any of these defects, could be 
classified to a certain defect because the unknown pattern and that defect lie on the same 
side of the hyperplane. More neurons are therefore required in the hidden layer to separate 
the fingerprints of the two defects from the surrounding space. However, in more than two 
dimensions, the structure of data is unknown and it is difficult to estimate the number of 
neurons in the hidden layer. Also, when it is required to add new defects for recognition to 
a previously trained network, the network has to be completely retrained. Hie AhlN 
learning process is, in general, time consuming. 

Although the application of ANNs is reasonably established, its application in the area of 
partial discharge is relatively new. An important aspect in the application of ANNs is the 
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definition of the input information which is fed into the ANNs. The following paragraphs 
briefly review the AJSlNs and the type of the input patterns used which have been used for 
the partial discharge classification. 

A three layer feed forward ANN with back propagation learning algorithm has been used 
for the automatic distinguishing between partial discharge generated in an XLPE cable and 
noise (Suzuki et. al. 1992). A sample of 5 m length of cable was used to generate different 
patterns with and without partial discharge. Partial discharge was generated by using a 
metallic needle electrode with a tip radius of 5 pm which was inserted 1 mm into the 
insulation of the cable. The input patterns were related to the number of pulse counts (say 
n) in each pixels of the (p-q-n distribution. To reduce the input data, the (j)-q-n distribution 
was divided into a 20 (phase angle) x 32 (discharge magnitude) pixels. In addition, pixels 
for negative pulses in the positive half cycle and positive pulses in the negative half cycle 
were ignored to reduce the insignificant pixels. Therefore, the input patterns of the ANN 
consisted of a series of 320 input values. Three types of input patterns; (p-q-1, (j>-q-n, and 
p-q-n were used. In case of (j)-q-l pattern, the input value was taken to be equal to one if 
the number of pulses was greater tlian or equal to a given threshold value. In case of the (p- 
q-n, the input value was equal to the number of pulse counts. In the case of p-q-n\ the 
input value was equal to the actual number of pulses in the corresponding pixel plus a 
correction factor calculated from the adjacent 8 pixels in order to smoothen the patterns. It 
was found that the ANN could easily discriminate partial discharge from noise by using (p- 
q-1 patterns. 

ANN using back propagation method has been applied for the discrimination of partial 
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discharge patterns before and after the tree initiation ftom the needle weak point [Hozumi 
et al, 1992]. A metallic needle was inserted in epoxy resin to initiate the electric tree. The 
mput patterns were based on the (p-q-n and (p-q distributions. The (p-q-n was divided into 
20 windows for the phase angle and 16 windows for the partial discharge magnitude. 
Therefore, a (jy-q-n pattern was composed of 320 values. The (p-q indicates the mean partial 
discharge magnitude of the pulses in each phase window. The number of windows was 20, 
the width of the phase window being 1 8°. However, the network which was learnt from (p- 
q-n patterns showed better discrimination performance than that which was learnt fi-om (ph-q 
patterns. Similar technique with tire same input patterns was used for seven different partial 
discharge source classification by using three dimensional patterns [Satish et al, 1994]. 
Another approach based on 15 statistical parameters like skewness, kurtosis, number of 
peaks and cross correlation had been used to define input pattern of the ANN [Gulski et al, 
1993]. Tlrese parameters were used to quantify the profiles of two phase position 
distributions: the mean pulse height distribution (q-cp) which shows the average partial 
discharge magnitude in each phase window as a function of the phase angle cp and the pulse 
count distribution (n-cp) which shows the number of partial discharge in each phase 
window as a function of the phase angle (p. Back propagation network, Kohonen self 
organising map and learning vector quantization network were used for classification in 
this work. Two ways of appl 3 dng ANN were studied: (i) In the first case, single ANN was 
used with number of outputs equal to the number of partial discharge sources studied, (ii) 
In the second case, for each partial discharge source, a separate ANN with one output was 
built. Tlrus in the second case M ANNs were necessary for the recognition of M partial 
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discharge sources. For the first case, two types of ANNs were developed, the first had only 
two outputs (two partial discharge sources) while the second had 12 outputs (12 partial 
discharge sources). In the case of only two outputs, the back propagation network, 
Kohonen self organising map and learning vector quantization network classified correctly 
the test patterns taken from the set of patterns used for training. However testing with 
unknown patterns resulted in a number of misclassifications. In the case of 12 outputs, the 
back propagation network provided satisfactory results for all the studied partial discharge 
sources compared to the other types of ANNs. When using M ANNs for M individual 
partial discharge sources, each network was trained to give unity output for the presence of 
a specific partial discharge source and zeros for the other partial discharge sources which 
called the counter sources. However, it was observed that different outputs were obtained 
for the same test partial discharge source when the network was trained with two different 
counter partial discharge sources. 

Another approach which was not based on these distributions, but focussed on features that 
describe the shape of the partial discharge pulses, namely the apparent charge, rise time, 
fall time, width and area of the partial discharge pulse, had also been used [Mazroua et al, 
1993]. These five features cooperate together to form the partial discharge pattern that were 
fed to a multilayer ANN trained using the back propagation algorithm. Artificially created 
cylindrical cavities of 2 mm in diameter and having depths ranging between 1 .0 and 2.0 mm 
fomied in acrylic disk specimens, were employed to test the capability of the ANN to 
discriminate between different partial discharge pulse patterns. Two different structures of 
ANNs were adopted to perform the classification task between three outputs (three depths). 
Tlie first one with single hidden layer and tlie second one with two hidden layers. It was 
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observed that the two structures yielded the same results with 75% accuracy for the training 
patterns when used for testing whereas their discrimination performance decreased to 50% 
when the network was tested for 10 patterns which were not used for the training. When 
the difference between the cavity depth sizes under investigation increased, the 
classification accuracy increased to 95% and 70% for the training patterns and the patterns 
which were not used in the training. The ATT! succeeded in discriminating between the 
patterns of the metallic electrode cavities and the cavities between the dielectric and 
electrodes with a success rate of 100% for both the training and the test data. The same 
features and the same ANN had been extended for the recognition of dischaige sources of 
different types such as cavities and electric trees within the insulation system as well as the 
recognition of changes in the partial discharge shapes that were associated with the 
deterioration of ageing effects within the defects undergoing discharge [Mazroua et ol, 
1995]. A new feature given by the product of file actual test voltage and the apparent 
charge as well as the width of the pulse had been replaced by the multiplication of the pulse 
width by file apparent charge which resulted in six features. These were used to learn three 
different ANNs [Mazroua et al, 1994]; namely, the multilayer perception, nearest 
neighbour classifier and the linear vector quantization. It was noticed that the recognition 
capabilities of the three ANNs were comparatively equivalent. 

The back propagation feed forward ANN had also been used to discriminate the partial 
discharge of three kinds of electrode systems which resulted in 24 different classes 
[Okamoto et al, 1995]. These groups were a needle-plane electrode system group, a 
spherical void electrode system group, and a crack void electrode system group. The input 
patterns based on the ^h-q-n were divided into 20 phase windows and 16 magnitude 
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windows. Five patterns were used to represent each class. In order to examine the 
recognition ability of the network, two kinds of input pattern groups were used. One was a 
set of unlearned patterns and the other was a combination pattern of two different electrode 
groups. While the ANN could recognise successfully the patterns which were used for 
training, the classification accuracy reduced for the unlearned patterns to around 85%. In 
the case of combination of two electrodes, there was no output other than A, B output when 
the input patterns were a combination of an A electrode patterns and a B electrode patterns. 

1.6.4.2 Conventional classifiers 

There are number of conventional classifiers available for classification [Kreuger et al, 
1993]: 

Minimum distance classifier 
Rate of recognition classifier 
Centaur score classifier 

In the above methods, parameters such as the mean value, standard deviations of tire 
features for different effects, are calculated. Then the distance between a finger print of 
unknown origin and the known defect is calculated to assign the finger print to the proper 
defect. 

1.6.4.2.1 Minimum distance classifier 

Tire minimum distance (Euclidean distance) classifier computes tire distance from the 
unkno\\n pattern to the mean of each class, and assigns the unknown pattern to the class to 
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which it is closest. The distance is given by [Ton and Gonzalez, 1974]; 

Where, 

X, is the i* component of the classification features vector (finger print) 

/j,k is the i* component of the mean values of the features vector belonging to a 
reference defect class k. 

The disadvantage of this approach is that the classifier can not provide "I do not know" 
answers. So, a finger print which does not belong to any of the known problems may be 
misclassified as one of the known problems. 

1.6.4.2.2 Recognition rate 

The recognition rate is determined in the following steps [Kreuger et al, 1993]: 

- Several samples of known defects are taken and measured for discharge. 

- For each parameter the mean value is determined, and the standard deviation is 
derived from the scatter of these values. 

- Now, each parameter has a most probable value between two limits (mean value - 
standard deviation and mean value + standard deviation). 

- The unknown defect is measured and the value of each operator compared to that 
of the known defect. 

- If the difference is small, a hit is recorded. 

- In this way, all the parameters are compared and the number of results that 
coincide are recorded, this total number is called as ‘recognition rate’. Actually, 
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recognition rate and Centaur score do not differ much in their discriminating 
power. 

1.6.4.2.3 Centaur Score 

The Centaur method creates percentage contour in the form of hyrerellipsoids (ellipse in 2- 
d space) around the mean value of features for a particular defect [Kreuger et al, 1993]. 
The size and shape of hyrerellipsoid is determined by the standard deviation of each feature 
and mutual correlation of the features. The Centaur method has been applied successfully 
for the recognition of created defects in insulation and in actual high voltage equipment. 
The use of the Centaur score method is, however, restricted to normally distributed data of 
a particular defect. By the careful design of the data base for discharge recognition e.g. by 
splitting a non normal distribution into several normal ones, this condition can be fulfilled 
in such a way that the misclassification can be avoided. 

1.7 Partial Discharge Training Data 

A carefully designed data base for partial discharge classification should produce high level 
of similarity between a iBnger print to be classified and a known insulation defect in the 
case of a correct recognition and a low level of similarity in all other cases. Several 
questions arise when creating the database. For example: 

- How many features are sufficient for recognition? 

- How many patterns of one partial discharge source are required for successful 


feature classification? 
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Usually as many features as possible are extracted from the patterns. However, the higher 
the number of features, the longer time is required to calculate the features and their 
classification. Some features may be useless for pattern recognition because they have no 
discriminating power. A compromise between the two conflicting requirements can be 
achieved by selecting only the best features by maximising certain criteria. 

1.8 Motivation and Thesis Objectives 

It has been observed that the partial discharge generated by ac voltage are significantly 
influenced by memory associated with the charge developed by preceding discharge events 
[Van Brunt et al, 1989]. Also it is well knovra that partial discharge phenomena generally 
involves stochastic processes in the sense that when a partial discharge pulse is generated, 
it leaves behind an imprint that can influence development of the subsequent discharge 
pulse. These imprints may, for example, be in the form of a negative ion space charge 
moving in the gas, molecules in excited metastable state or charges on the surface. These 
conclusions have been drawn from the study of various conditional and unconditional 
pulse-height and pulse-time-separation distributions [Van Brunt et al, 1989, 1991, 1993]. It 
is assumed that the discharge phenomenon consists of a sequence of pulses denoted by the 
pulse amplitudes qn and time separation between successive pulses At„. The time separation 
is measured between the n‘^ and the (n+l)‘^ pulses. The possible conditional pulse-height 
and time-interval probability distribution functions may be of zero, first or second order. 
The zero order functions are obviously unconditional. For example, po(qn)dq„ is the 
probability that any discharge pulse occurs with magnitude in the range q„ to q„ + dq„ is 
independent of n. The distributions pj and p 2 are conditional, e.g. p](q,/At„.i)dq„ is the 
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probability that the n* discharge pulse for arbitrary n will have an amplitude between q„ to 
qn + dq„ , when its time separation from the preceding pulse is restricted to the value At„.} , 
and p2(qr/At„.])d q„ , q„.i) dq„ is the same, with the added restriction that the amplitude of 
the preceding pulse is also fixed to at a value q„.]. An immediate indication that qn and At„ 
are not independent can be drawn from the observation that a measured conditional 
probability is not equal to the corresponding unconditional distribution. Also, it is found 
that the conditional distribution changes its shape and position for different values of . 
Therefore, it is the nature of the phenomenon that a significant correlation exists between 
successive discharge pulses. Hence, one can not assume that pulse amplitude and time 
separation from the previous pulse are independent random variables. 

It is clear from the above that the pulse height phase resolved distributions contain no 
information about the correlation between successive pulses in the same half cycle and 
therefore provide no information about memory propagation effect. Therefore, the main 
aim of this thesis is to explore another effective technique which can take the memory 
effect into consideration and eliminate the averaging effect of these distributions as well as 
to get an overview of every single partial discharge pulse of the whole measurement. The 
texture analysis has been successfully applied to the image processing and remote sensing 
fields. However, it is yet to be tried for the partial discharge pattern recognition. In the 
present work, an attempt has been made to apply the texture analysis algorithm to 
discriminate different types of partial discharge sources. The main aims of the thesis are to 
attempt the following: 


(a) Investigate the discriminating power of different texture analysis algorithms for 
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the partial discharge source classification. 

(b) Compare the discriminating power of the texture analysis algorithms with the 
discriminating power of the (q-<|>-n) distributions method for partial discharge 
sources classification. 

(c) Determine the relative discriminating power of each feature individually. 

(d) Determine the niinimum number of features used at a time to achieve a good 
classification accuracy. 

(e) Determine the best features selection technique to minimise the number of 
features which can be used to classify different partial discharge sources. 

1.9 Thesis Organisation 

The work reported in this thesis is organised in nine chapters: 

The present chapter introduces the partial discharge phenomenon, reviews the conventional 
methods for partial discharge classification, limitations of these methods and briefly 
outlines the work carried out in this thesis. 

Chapter 2 presents theoretical background of four different texture analysis algorithms. The 
first three of the algorithms are in the spatial domain while the last one in the frequency 
domain. These algorithms include the spatial gray level dependence method (SGLDM), the 
gray level difference histogram method (GLDHM), the gray level run length method 
(GLRLM), and the power spectral method (PSM), respectively. 
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Chapter 3 describes the experimental set up used for partial discharge measurements and 
the samples used to simulate the partial discharge sources. The standard partial discharge 
sources, which have been used in this study to evaluate the disciiminating power of the 
texture analysis algorithms, include glow corona, streamer corona, surface discharge, 
intemal discharge, single protrusion and multi protrusions which were created in the 
laboratory. 

Chapter 4 utilises the minimum distance classifier for partial discharge classification using 
the features based on the texture analysis algorithms. The classification accuracy of each 
algorithm has been investigated and the classification accuracy of each feature individually 
as well as for the combination of two features at a time have been studied. 

Chapter 5 presents the application of a direct feature selection technique in the form of 
transformed divergence analysis. Transformed divergence has been used to measure the 
distance between the classes in the feature space. The optimum features have been selected 
according to the maximum separation between the different partial discharge classes. 

Chapter 6 investigates the use of a non parametric classifier in the form of back 
propagation artificial neural network (ANN). The application of ANN were mainly 
motivated by the observation that some of the partial discharge sources are not normally 
distributed. Thus the assumption for minimum distance (mahalanobis distance) and the 
transfomied divergence were not satisfied. A three layer feed forward ANN has been 
developed to classify different partial discharge sources. 
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After the discriminating power of texture analysis algorithms is established, in chapter 7, 
the GLDHM features have been used to distinguish two partial discharge sources generated 
from high voltage cable as a practical example. A paper insulated lead covered belted cable 
has been used to generate internal discharge, which takejplace in the microvoids within the 
insulating paper. Surface discharge at the end terminal of the cable are also produced under 
unprevented conditions. 

Chapter 8 describe;;the application of an indirect method has been used to reduce the 
number of features by mapping the original set of features into a new set of features lesser 
in number in which the separation between the classes are maximum . The principal 
component transformation of the four texture analysis algorithms features have been 
investigated to determine the classification accuracy of the main principal components. 

Chapter 9 presents summary of main conclusions of the thesis and includes few 
recommendations for further research work. 
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Chapter 2 

Basics of Texture Analysis 


2.1 Introduction 

From the discussion in chapter- 1, it is observed that the effect of the memory propagation 
between the successive partial discharge pulses are not taken into consideration by the 
phase resolved pulse height analysis. The memory propagation means the dependence of 
the magnitude of particular pulse on the magnitude of the preceding pulse and the time 
separation between the two pulses. Texture analysis algorithms mainly investigate the gray 
level variation in images [Haralick et al, 1973]. In other words, they investigate the relative 
magnitude of gray level and its spatial distribution in images. Therefore, texture analysis 
algorithms are expected to create improved discriminating features between different 
partial discharge sources by investigating the relationship between the adjacent partial 
discharge pulses. The memory propagation between successive pulses can be considered in 
the form of relative pulses magnitude as well as the spatial distribution of these pulses. 

Since the application of the texture analysis to the partial discharge sources classification is 
new, the objective of this chapter is to develop an understanding of the temhnology being 
used and conceptual background of texture analysis. A careful review of the theory of 
texture is presented in this chapter. 
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2.2 Background 

Application of texture analysis has always been an interdisciplinary field of endeavor. It 
involved people from remote sensing application, medical application, metallurgy, 
computer science, and military application. In remote sensing application, the texture 
analysis has been used for land use studies, terrain classification, cloud analysis, and sea 
ice detection. In medical science, it has found applications in chest, dental, and other x-ray 
pictures and chromosomal analysis. In metallurgy, it has been used to study electron 
micrographs of various metal and alloy samples. In computer science it has been used for 
texture-based techniques for efficient automation. Military applications use texture 
information for the detection of the targets. In fact, it is an extremely difficult task to list all 
possible uses of texture analysis because texture has been a part of our physical impression 
of tire visual world. The following sections outlines, in brief, the concepts and 
developments in texture analysis. 

2.3 Concepts in Texture Analysis 

The information content of an image can be defined in terms of three fundamental features: 
spectral tone, texture, and context. Spectral features describe tonal variations in spectral 
bands of the electromagnetic spectrum. Texture features contain information about the 
spatial distribution of spectral variations, and contextual features provide adjacency 
information in an image area. Haralick et al (1973) suggested that textural and spectral 
properties are present in an image simultaneously but under a given condition, one 
property can dominate the other. When a small region has a wide variation in gray level, 
tlie dominant property is texture for that area. If the area has very little variation in gray 
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values, tone becomes the main property. In other words, as the number of distinguishable 
discrete tonal features decrease, the tonal properties dominate, whereas textural properties 
dominate with the increase in distinguishable discrete tonal features. 

A 

2.3.1 The notion of texture 

Hawkins (1970) suggested that the notion of texture depends on three things: 

(i) Repetition of some local order over a region which is large compared to the order’s 
size. 

(ii) Non-random arrangement of elementary parts in the local order 

(iii) Elementary parts of the local order being roughly uniform entities and having 
approximately similar dimensions everywhere within the textured region 

2.3.2 Terms and definitions in texture analysis 

It has been extremely difficult task for researchers to define the meaniag of texture. 
Everyone understand the concept of texture but it is extremely difficult to define the term 
texture. Generally, researchers have defined it qualitatively and have tried to use numerical 
methods based on the quality of the texture they want to extract. Qualitatively, texture is 
described by various terms like coarseness, contrast, directionality liaelikeness, regularity 
and roughness (Tamura et al, 1978). Coarseness depends upon the pattern and size of 
texture elements. Contrast depends upon the ratio of black and white area, sharpness of 
idges and period of repeating patterns. Directionality depends upon the shape of the 
lements and placement rules. Liaelikeness refers to the shape of the primitives of the 
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textures, supplements coarseness, contrast, and directionality. Regularity relates to the 
variation in placement rules of the primitives or texture elements. If the placement rules are 
fixed and there is less variation in tone, the texture is said to be regular and coarse. But if 
the placement rules for texture elements are less fixed and there is more variation in gray 
levels, the texture becomes irregular and fine. Roughness is a mental concept related to real 
world of three dimensional objects. 

Apart from these descriptions, texture is essentially a resolution dependent phenomena. It 
implies that a change in the scale of imagery will change the whole perception of 
coarseness, contrast, and directionality of the texture. Thus a texture which is fine at a 
particular resolution may appear coarse with a change in the resolution of the imagery. In 
totality, image texture can be described as an area phenomenon rather than a point 
consisting of many texture elements which can be statistically described with some 
placement rules. In nature, spatial organization of these texture elements gives rise to 
deterministic as well as random textures. The deterministic texture is one in which the 
texture elements can be strictly represented by some fixed placement rules. A random 
texture is one in which the texture elements follow some statistical laws and are stochastic 
in nature. With the above background, it is possible to introduce some definitions of 
texture; 

- Texture relates to the relationships between gray levels in neighboring resolution cell 
which contribute to the overall appearance or visual characteristics of an image 
(Thomas, 1977). 
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- Texture could be defined as a structure composed of a large number of more or less 
ordered similar elements or patterns without one of these drawing special attention so 
that a global imitary impression is offered to the observer (Gool et al, 1985). 

- Visual texture refers to the impression of roughness or smoothness created by tone or 
repetition of visual pattern across the surface (Irons and Peterson, 1981). 

Texture is simply a repeated variation in tone over relatively small areas ( Swain and 
Davis, 1978). 

- Texture is the visual ejffect which is produced by spatial distribution of tonal variations 
over relatively small area (Andera Baraldi and Flavio Parmiggiani, 1995). 

Apart from these definitions of texture, there are some other terms which are quite 
frequently used in texture investigations. Definitions of these terms are given below: 

2.3.2.1 Texture analysis 

It is the process in which image gray level spatial relationships concerned with shape 
position and periodicity are assigned mathematical descriptors suitable for characterizing 
the properties in such a way that they may serve as features for extraction in machine 
classification (Thomas, 1977). 

2.3.2.2 Texture feature 

Relationships including the statistical distribution of gray levels or information about the 
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boundaries of edges arising from gray level gradients or discontinuities when represented 
by mathematical or information theory measures are called texture features. These are 
vectors which contain a large amount of information related to the discrimination between 
class types in a^iven set of classes. 

2.4 Texture Analysis and Computational Methods 

In attempt to simulate the process of visual perception for texture analysis, researchers 
have tried to extract texture by capturing some form of the first- and second-order 
statistics. In computational texture analysis, the main problems are[Ehrich and Foith, 
1978]: 

(i) Identification of the class label for a given textured region; 

(ii) Description of a given textured region; 

(iii) Construction of boundaries between the major texture regions. 

Ihe first problem is the texture classification, which requires derivation of a set of rules to 
discriminate between a given finite number of texture classes. This problem is usually 
approached by using some statistical methods for learning the characteristic parameters of 
each defined class. Description of textured region is more difficult as it deals with the 
extremely complex structure of texture. Accurate description of texture requires an 
understanding of the stractural dependencies among basic and composite textural elements. 
Boundary definition is the most difficult problem because texture properties may undergo 
slow spatial variation that is difficult to detect or quantify. 
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2.5 Developments in Texture Classification 

From classification point of view, texture has been studied at two levels: statistical and 
structural levels [Haralick, 1979]. The statistical approach aims at a global characterization 
of texture. Statistical properties of the spatial distribution of gray levels are used as texture 
descriptors. The key featoe of this approach is that the description solely depends upon 
point properties, with no explicit use of element or subregions. At this level, texture is 
assumed to be defined by a set of statistics derived from a large ensemble of local picture 
properties. The second level of structural analysis attempts to isolate the pr imi tives or units 
of texture and describe the relations between these in the texture pattern. 

Tlie statistical and structural approaches have their own advantages. A lot of work has been 
done as far as statistical approaches are concerned since they are simple. Stmctural 
approach, though apparently realistic, find their utility in cases where it is possible to 
detect some identifiable patterns. This does not mean that they can not be applied to other 
situations. 'The problem is that, with an increase in the scene complexity, computational 
loads go beyond feasible limits for application of structural approaches. Hence, such 
approaches have been limited to the analysis of well defined structures only. 

2.5.1 Statistical approaches to texture analysis 

Statistical approaches are based on deriving a set of statistics of a particular order in the 
spatial or frequency domain and characterizing texture completely by these statistics. 
Statistical approaches can be divided into spatial domain analysis and frequency domain 
analysis. 
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2.5.1. 1 Spatial domain analysis 

This approach can be further divided into first order statistics and second order statistics. 

(a) First order statistics 

These include measures based on probability distributions of single pixel attributes. The 
most common of these are the mean and the standard deviation. The mean is a measure of 
overall image brightness, whereas the standard deviation is a measure of variability in 
spectral values within the image. Other measures include the coefficient of variation, 
higher moments of different order, spatial autocorrelation, gray level difference and run 
length statistics.. In one of the earliest articles on computational texture analysis, Hawkins 
(1970) classified texture measures into four groups based on properties such as spatial 
frequency, gray level, local shape, and higher orders. The spatial firequency approach was 
based on the fact that most objects in nature have an identifiable spatial firequency of 
variation. Gray level measures included features like the average, variance, histogram, and 
gray level entropy. 

Galloway (1975) suggested that texture could be measured by forming a gray level run 
length matrix (GLRLM) by counting the number of gray level runs of various lengths and 
tlien calculating various statistical features derived firom this matrix that could be input to 
the classifier. 

Conners and Harlow (1980) proposed the gray level difference histogram measure 
(GLDHM) in which a probability distribution function for the absolute difference in the 
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gray levels of pixels, separated by a fixed spatial relation, is constructed. From this 
distribution, various features are calculated to classify texture. 

(b) Second order statistics 

These methods are based on the spatial arrangement of adjacent pixels and consider the 
second order probability distribution. Since it is possible to derive the first order measures 
from the second order measures, tire first order measures provide less information than 
second order measures. 

Haralick et al (1973) presented one of the most widely used approaches to texture analysis. 
It was based on tlie estimation of the second order joint probability density functions. The 
approach is called as the spatial gray level dependence matrix (SGLDM) approach. The 
intermediate result of this approach is in the form of so called co-occurrence matrices. 
From this distribution, various features are calculated to classify texture. 

2.5. 1.2 Frequency domain analysis 

This approach is based on using the power spectmm method (PSM). It is important to 
explain the concept of spatial frequency. Spatial frequency is the image analog of the 
frequency of a signal in time. A sinusoidal signal with high frequency alternates rapidly, 
whereas a low frequency signal changes slowly with time. Similarly, an image with high 
spatial frequency, say in tire horizontal direction, exhibits frequent changes of brightness 
with position horizontally. A picture of a crowd of people would be a particular example. 
Typically, an image is composed of a collection of both horizontal and vertical spatial 
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frequency components of different strengths and these are what the discrete Fourier 
transform indicates. Entries in the Fourier transformed image <|)(r 5 s) represent the 
composition of the original image in terms of the spatial frequency components, both 
vertically and horizontally. Hie upper left hand pixel in (t)(r,s) i.e. (t)(0,0) is the average 
brightness of the image. This is the component in the spectrum with zero frequency in both 
directions. Thereafter pixels of (t)(r,s) both horizontally and vertically represent components 
with frequencies that increment by 1/k, where the original image is of size kxk pixels. The 
features commonly used with the PSM are (Weszka et al, 1976; Conners and Harlow, 
1980; Gool etal, 1985); 

i- Annular-ring sample geometry. 

ii- Wedge sampling geometry. 

iii- Parallel slit sampling geometry. 

2.5.2 Structural approaches to the texture analysis 

These approaches consider texture as an arrangement of a set of spatial sub-patterns 
according to certain placement rules. The sub-patterns' themselves are, in general, made up 
of smaller sub-pattems, positioned according to some placement rules. This recursive 
nature of the approach captures the hierarchical structure of natural scenes where both the 
sub-pattems and their placement may be characterized statistically. The approach is 
primarily based on the concepts of unit pattern and well defined placement rules. The 
following steps form the key to this approach (Haralick, 1979): 
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- Location of imit patterns primitives; 

- Extraction of features that characterize primitives; 

Extraction of features that characterize the placement rules of unit patterns. 

The term primitive is used to represent a set of connected resolution cells which is 
characterized by a list of attributes. The simplest primitive can be a single pixel with its 
gray level as its attribute. Thus, primitives are maximally connected set of resolution cell 
having some specific properties. Various attributes can be used to characterize a primitive 
including average gray level, size or number of connected pixels, elongation, spread, and 
orientation of primitives axes. 

2.6 Statistical Texture Analysis Algorithms 

Consider that the image to be analyzed is rectangular and has Nx resolution cells in the 
horizontal direction and Ny resolution cells in the vertical direction and the gray level 
appearing in each resolution cell is quantized to Ng levels. For example, consider a 4x4 
image with four gray levels 0-3 such as [Haralick 1979] 
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Each resolution cell in the image, excluding those on the periphery of the image, are 
having eight nearest neighbor cells. In the following sections different texture analysis 
algorithms which can be used to extract the discriminating uifomaation, for the above 
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example, are described. 

2.6.1 Spatial gray level dependence method (Co-occurrence) 

Gray level co-occurrence matrix describes the frequency of one gray tone appearing in a 
spatial relationship with another gray tone, within the area tmder investigation. In other 
words, the image can be specified by a matrix of relative frequency P(v) with which two 
neighboring resolution cells separated by distance d occur on the image, one with gray tone 

1 and the other with gray tone;. Such matrices of gray tone spatial dependence frequencies 
are a function of the angular relationship as well as a function of the distance between 
them. The following table shows the general form of any co-occurrence matrix. For 
example, the element in the (2,1) position is the number of times two gray levels of values 

2 and 1 occurred adjacent to each other in a given direction. 
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When the relationship is nearest horizontal neighbor, there will be 2(Nx-l) neighboring 
resolution cell pairs on each row and there are Ny rows providing a total of 2Ny(Nx-l) 
nearest horizontal neighbor pair. For the vertical direction, there will be 2Nx(Ny-l) nearest 
vertical neighbor pair. For the left and right diagonal, there will be 2(Ny-l)(Nx-l) nearest 
vertical neighbor pair. For the above example, the co-occurrence matrices in the horizontal 
direction, vertical direction, left diagonal direction, and right diagonal directions for 
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where p(i,j) are the entries of the co-occurrence matrices 


Entropy (ENT), is a measure of the degree of complexity or heterogeneity or disorder in 
the image. If all the values of gray level are equally probable then the entropy will be high. 
This will be true for very busy texture. It will be small for homogeneous images i.e. when 
entries in co-occurrence matrix are very different. The entropy is a measure of randomness 
in the image, and can be defined as, 

ent = -2;EiP(i,;)iogo.(i,j)) (2.2) 

1=0 ;=o 


Inertia or Contrast (CON), is a measure of local variation in an image and calculates the 
moment of co-occuixence matrix about its mean diagonal. It is defined as, 


n-l rt-l 


CON = piij) 

/=o y=o 


(2.3) 


Inverse difference moment(ID]VI), is a measure of local similarity in the image. It 
characterizes the image in terms of lack of variability in the gray level value. IDM is high 
when diagonal concentration of element is high. DDM is defined as. 


piU) 

jdm = YJ] , .,2 


(2.4) 


Correlation (CORR), is a measurement is expected to be associated with linear structures 
in the image. It is a measure of linear gray level dependence and defined as. 


ImO JmO 




Variance (VAR), is a measure of heterogeneity of the image. 


(2.5) 
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n-1 ;i-l 

VAR = ZZ(‘ 




M^)(J - My)p{i,j) 


( 2 . 6 ) 


2.6.2 Gray level difference histogram method (GLDHM) 

In the gray level difference histogram method, the probability distribution function of the 
absolute difference in the gray level of pixels, separated by a fixed spatial relation, is 
constructed. From this method, one gets four vectors in the four directions. Each vector has 
length equal to the difference between tlie maximum and minimum gray level in the image 
plus 1. For tire example given in section 2.6, the differences could be 0, 1, 2 and 3. Zero 
means that the two adjacent cells have the same gray level and three means that one of the 
two adjacent cells has the maximum gray level value and the other has the minimum gray 
level value. 'Ore gray level difference histogram for the distance equal to one pixel in the 
different directions can be given as following: 


Hie length of the histograms depends upon the gray level of the various pixels in the image 
only, therefore their lengths are the same in the different directions. From these histograms, 
various textural features can be calculated to classify texture. These features are mean, 
angular second moment (ASM), contrast (CON), entropy (ENT) and inverse difference 
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moment (IDM) as defined below. 


n-l 


MEAN = J^p(i) i 

1=0 

(2.7) 

ASM = Y,(P(i)f 

x = 0 

(2.8) 

CON = 2 P(.0 

(2.9) 

w=-|;^(oiogop(,')) 

(2.10) 

THAf V 

(2.11) 


where p(i) are the entries of the gray level difference histogram. 

2.6.3 Gray level run length method (GLRLM) 

GLRLM extracts gray level run which is a set of consecutive image points having the same 
gray level. The run length matrix in a given direction will be of the form: 
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Ihe entry p(rj) in a run length matrix specifies the number of times tlie image contains a 
run of length j in the given direction consisting of points having gray level i. For example, 
the element in the (2,3) position shows the number of times that three adjacent cells have a 
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gray level value of 2 in a given direction. In this method for a given direction, we get a 
matrix which has dimension axb, where a correspond to the values varying from minimum 
to maximum gray level in the image and b equals the number of cells of the original image 
in the corresponding direction. For the example used earlier the run length matrix in the 
different directions will be as following: 
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For each one of the above matrices, various features can be calculated in the different 
directions. Following notations have been used for calculating the run length features: 
rig = number of gray levels in the image 
rir = number of different run lengths that occur 
p = number of pixels in the image 
p(ij) are the entries of the run length matrices 

i»0 i»l 

This teclmique extracts the following five textural features. 

Short run emphasis(SRE): This feature emphasizes short runs. In case of lot of gray 
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level variation and hence runs of small length, this feature will assume high value 
compared to images which will have less variation and many runs of longer lengths. It is 
defined as, 

SRE f ! R (2.12) 

1=0 ; = 1 

Long run length emphasis(LRE): This feature emphasizes long runs and hence images of 
less variation. It can be defined as, 

LRE = / R (2.13) 

1=0 ; = 1 

Gray level non-uniformity (GLN): This feature measures the gray level non-uniformity. 
When runs are equally distributed throughout the gray levels, the function takes its lowest 
value. It is defined as, 

GLN =^Y.P(}.j) 

Run length non-uniformity (RLN): This function measures the non-uniformity of the run 
lengths. If the runs are equally distributed throughout the lengths, the function will have a 
low value. It can be defined as, 

”, 

RLN = 'Z 

Jtsl 1=0 

Run percentage (RP): This function is the ratio of total number of runs to the total 
number of possible runs if all had length of one. It has its lowest value for image with most 
linear structure. 

Rp=ztp(i.jyp 

/«0 ;'=l 




(2.16) 
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2.6.4 Fourier power spectrum method 

The Fourier transform of a two dimensional function/fjc,;;; is defined as [Pratt, 1978], 

oo CO 

^(r,s)= I (2.17) 

— CO —00 

and the Fourier power spectrum is given as \Ff = FF* (where * denotes the complex 
conjugate). In case of a coarse texture, which has less gray level variation, the value of 
Fourier power spectrum will be high and concentrated at the origin. While in a fine texture, 
which has more gray level variation, this value will be more spread out. Thus, to analyze 
any texture, a set of features that should be usefiil are the averages of Fourier power 
spectrum over ring-shaped regions or vertical and horizontal slits [Pratt, 1978]. 

For a nxfi digital images, consider that f(ij) is the gray level of a pixel at location (iJJ. The 
Fourier transform of the image, in discrete form, is defined as, 

= 4 ( 2 . 18 ) 

« ,j=o 

The calculation of the discrete Fourier transform (DFT) is very much time consuming 
[Richards, 1993]. 'flierefore, in this work, the fast Fourier transform (FFT) has been used. 
In FFT, the computation time increases almost linearly with the number of inputs, whereas 
in case of the DFT the time increases quadratically. Application of the FFT requires that 
the number of inputs, for example k, to be continuously power of 2 (say k=2”'). Since the 
images used have two dimensions, the FFT is required to be calculated for each dimension 
individually (i.e. it is necessary to transform each row individually to generate an 
intermediate image, and then transform this by columns to yield the final result). Thus, in 
this study, the dimensions of the images will have the power of 2. Also the application of 
the FFT requires that the order of the input data fed into the algorithm are rearranged for 
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odd and even inputs before the technique can be employed. For example, if we have 8 

inputs with order 0,1, 2, 3, 4, 5, 6, 7 the final arrangement should be 0,4,2,6 and 1, 5,3,7. This 

can be achieved simply by a process known as bit shifting. To do this, the index k of the 

input data X(k) is expressed in binary notation, the binary digit is reversed, and the new 

binary number is converted back to decimal form. For example; 

X(0) X(OOO) =:> X(OOO) X(0) 

X(l) X(OOl) X(IOO) => X(4) 

X(2) X(OIO) X(OIO) X(2) 

X(3) X(Oll) =i> X(llO) => X(6) 

X(4) => X(IOO) => X(OOl) => X(l) 

X(5) X(lOl) ^ X(lOl) X(5) 

X(6) => X(llO) ^ X(Oll) X(3) 

X(7) => X(lll) ^ X(lll) ^ X(7) 

The standard set of texture features based on horizontal and vertical slits of the discrete 

Fourier power spectrum are of the form of. 


s(m-i-Am) ^ 

Horizontal slits: Sl(m)= ^ 

s(m) 


r(m+Am) 

Vertical slits: S2(ni)= 

r(m) 


2.7 Partial Discharge and Texture Analysis 

It is clear from the above description of the texture analysis techniques, used to extract 
discriminative features, that these can differentiate between different textures by using the 
gray level variation between the pixels. The partial discharge is an electrical phenomena 
which gives rise to electric pulses of stochastic nature. The measurement of partial 
discharge by using wideband detector is very useful to detect the partial discharge on pulse 
base as shown in Fig- 2.1. Fig. 2.1a shows a typical Glow corona pulses measured for the 
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positive half cycle of the applied voltage. The half cycle was divided into 2048 phase 
window. By using a low pass filter which makes each phase window represented by the 
average value of the neighborhood, one can get the low firequency component as shown in 
Fig. 2.1b. Now the filtered partial discharge pulses can be directly found by subtracting the 
low firequency component from the original measurement, as shown in Fig. 2.1c. 

If more than one cycle (say m) could be measured and the measurements are arranged in 
array such that the number of horizontal inputs n is the number of windows in each cycle 
and the number of vertical inputs m is the number of the measured cycles. In other words, 
this array consists of m rows and n column. Figs. 2.2 and 2.3 are examples of different 
partial discharge measurements, shown like images. Each image is of 64X64 pixels (i.e.64 
cycles and from each cycle only 64 phase windows are selected around the peak of the 
applied high voltage waveform, for clarity). If the gray level values, used in texture 
analysis algorithms, are replaced by the partial discharge pulses magnitude, the same 
concept of texture analysis algorithms can be used to study the partial discharge pulses 
magnitude variation. By studying the partial discharge magnitude variation, the relation 
between the adjacent pulses can be determined so that the effect of the memory 
propagation could be considered. 

Whereas these pulses are stochastic in nature and it is very difficult to formulate any 
relationship between them, the statistical approach will be used to generate the features 
which can describe the surfaces of these images. Each image will be constructed for a 
particular partial discharge source i.e. there is no more than one partial discharge source in 
the same image. Hence, the problems of computational texture analysis like the texture 
description and the boundary between different texture regions are not considered in tliis 
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thesis. In tliis thesis, the attempt is to classify or identify the source of each image by 
studying the relationship between the pulses heights and their spatial distributions. 



Fig. 2.1a: Original glow corona pulses 
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Fig. 2.1b: Low frequency component 



Fig. 2.1c: Filtered glow corona pulses 
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2,8 Conclusion 

In this chapter, the concept of texture has been discussed and different approaches for 
texture analyses were described. Four different techniques have been explained in detail. 
Tliree teclmiques are in the spatial domain and one in the frequency domain. These 
tecluiiques are the spatial dependence gray level method (SGLDM), gray level difference 
histogram method (GLDHM), gray level run length method (GLRLM) and the power 
spectrum method (PSM). The features generated from these techniques have been used in 
the subsequent chapters to discriminate between different partial discharge sources. The 
discriminating power of tlrese four techniques as well as the relative discriminnting power 
of the features of each technique will be investigated for the partial discharge sources 


classification. 
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Chapter 3 

Experimental Set Up 


3.1 Introduction 

Generation of different partial discharge patterns is the first step to investigate the 
application of texture analysis algorithms in partial discharge source classification. To 
generate these patterns, different partial discharge sources have to be simulated in the 
laboratory and a suitable measuring circuit has to be used. This chapter explains how 
different types of partial discharge sources have been generated and measured in the 
laboratory. Tlie description of the samples and equipment used as well as the circuit 
utilized for recording of the partial discharge patterns are presented in the present chapter. 

3.2 Measurement Circuit 

In practice, the most widely applied partial discharge measurement technique is electric 
pulse detection. This method is based on the measurement of the current impulses caused 
by a discharge in the defect. The partial discharges were measured using a strai^t 
detection circuit as shown in Fig. 3.1. This configuration permits the test sample to be 
grounded directly. The measurement set-up is principally composed of a high voltage 
source, test sample, coupling capacitor, measuring impedance, filter, amplifier, digital 
storage oscilloscope and personal computer. The high firequency partial discharge pulses 
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flow through the coupling capacitor and the detection impedance. The measured pulses are 
filtered, amplified, digitised by a digital storage oscilloscope and then processed by a 
computer. The specifications of all these components are given in the following sections. 



Fig. 3.1: Partial discharge detection circuit 


3. 2.1 High voltage supply 

Tire variable high voltage supply was obtained from a 100 kV, 50 kVA, ac power 
frequency partial discharge free test transformer. The high voltage terminal of the 
transformer was connected to a 1.1 nF, lOO kV partial discharge free high voltage 
coupling capacitor with the help of a 7.5 cm diameter aluminum pipe. The measurement of 
the partial discharge pulses were taken from the low voltage end of the coupling capacitor. 
Botli the test transformer and the coupling capacitor high voltage terminals are provided 
with domes in order to prevent any partial discharge up to their rated voltages. These were 
verified as being partial discharge free up to 95 kV. The coupling capacitor was connected 
to the test sample with the help of a 3 cm diameter flexible pipe which has bell shape end 
to prevent the partial discharge on the connecting leads. 
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3.2.2 Measuring impedance 

A RLC parallel circuit was used as the measuring impedance. It contains an inductance 
which is used to suppress the 50 Hz alternating current and its harmonics, such that the 
unbiased signal of partial discharge can be applied to the measuring circuit. With the help 
of a variable resistor in the range of 15 OQ to 12 kQ the measurement sensitivity can be 
adapted to the corresponding test requirement. Also it contains a spark gap, a charge 
eliminator and four diodes for over voltage protection. This impedance was also used to 
facilitate pulse observations since the inductance and capacitance produce an oscillatory 
response which persists much longer than the initiating pulse. 

3.2.3 Filter and amplifier 

Since the partial discharge pulses are , in general, a series of very short duration pulses 
having a magnitude of the order of nnillivolts super imposed on power frequency, these has 
to be filtered and amplified. An electric filter is often a frequency selection cicuit that 
passes a specified band of frequencies and blocks or attenuates signals of frequencies 
outside this band. Filters may be classified in a number of ways [Ramakant, 1993]: 

(i) Analog or digital 

(ii) Passive or active 

(iii) Audio or radio frequency. 

Analog filters are design®d to process analog signals. This is natural when dealing with 
signals that are continuous. However, digital filters process deals with discrete signals. 
Depending upon the type of the elements 'used in their construction, filters may be 
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classified as passive or active. Elements used in passive filters are resistors, capacitors and 
inductors. Active filters, on the other hand, employ transistors or operational amplifiers in 
addition to the resistors and the capacitors. The type of elements used dictates the 
operating frequency range of the filter. For example, RC filters are commonly used for 
audio or low frequency operation, whereas LC filters are employed at radio frequency or 
high frequency. In the audio frequency, inductors are often not used because they are very 
large, costly and may dissipate more power. An active filter offers the following 
advantages over the a passive one: 

(i) Gain and fi'equency adjustment flexibility: Since the operational amplifier 
is capable of providing a gain, the input signal is not attenuated as in the 
case of passive filters. In addition, an active filter is easier to tune or adjust. 

(ii) Reduced loading problem: Because of the high input and low output 
resistances of the operational amplifier, the active filters do not cause 
excessive loading to the source. 

(iii) Cost effectiveness: Typically, active filters are more economical than 
passive filters. This is because of the availability of variety of cheaper 
operational amplifiers and the absence of inductors. 

The most commonly used filters are low pass filter, high pass filter, band pass filter, band 
reject filter and all pass filter. In this study, since the measured partial discharge pulses 
have to be filtered firom the low frequency harmonics and the very high frequency 
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interference, analog active wide band pass filter was used. A wide band pass filter can be 
found by simply cascading high pass and low pass sections. To obtain a ± 20 dB / decade 
band pass filter, fitrst order high pass and low pass sections are cascaded. For a ± 40 dB / 
decade band pass filter, second order high pass and second order low pass sections are 
cascaded. In tliis study, first order high pass and low pass sections have been used. For the 
low pass filter, the higher cut off frequency was 400 kHz. Whereas, for the high pass filter, 
the lower cut off firequency was 40 kHz. The filter was constructed having the operational 
amplifier as the active element, capacitor and resistors as passive elements. The design of 
the low pass filter was done by selecting a capacitor C and then calculating the value of 
the resistance R witli the relation; 


R = 


1 


(3.1) 


Where, fy is the high cut off frequency. The schematic diagram of the low pass filter is 
shown in Fig. 3.2. The values oiRi and Rf determine the pass band gain Ap as following: 


+ ^ (3-2) 

A 

For the low pass filter, the capacitor C was selected as 0.1 nH and the resistor i? as (9.4 kO. 
The high pass filter was constructed simply by interchanging the position of the resistor R 
and the capacitor C used in the low pass filter. Hie schematic diagram of the high pass 
filter is shown in Fig. 3.3. The values of i? and C for low cut off firequency of 40kHz were 
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selected as 4 kOmd InF, respectively. 


V. 


Fig.3.2: Low pass filter with 400 kHz cut off frequency 



Fig.3.3; High pass filter with 40 kHz cut off frequency 


Fig. 3.4 shows the effect of using the filter and the amplifier on the measured partial 
discharge pulses. In Fig. 3.4, channel 1 (CHI) shows the input pulses of the filter and 
amplifier while channel 2 (CH2) shows the ouqmt pulses. The effect of the filter and 
amplifier can be noticed by comparing the scaling of the two channels (channel Iwas 
adjusted to 0.2V/cm while channel 2 was adjusted to 5V/cm). 
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KIKUSUI CXR 550aU 



Fig. 3.4: The effect of the filter and the amplifier on the measurements 
(CHI : Tlie partial discharge pulses before the filter & the amplifier 
CH2: The partial discharge pulses after the filter & the amplifier) 


3.2.4 Digital oscilloscope 

Kikusui COR5502U oscilloscope was used for digitizing and storing the PD data. The 
oscilloscope has two saving memory units, each having a 4k word capacity. The maximum 
sampling rate of the oscilloscope is 100 MS/s (10 ns). The signals from both the chamrels 
can be digitized whereas each channel has its own A/D converter. The input analog signal 
is converted into digital form by A/D converter and its output is copied onto the 
acquisition RAM. Then tire data is transferred from the acquisition RAM to the display 
RAM. Tire oscilloscope displays it on the CRT screen with the help of D/A converter. 
After displaying the waveform for about one second, the next data acquisition cycle starts. 
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The IF02-C0R interface which is based on the standard RS-232C liok, was attached onto 
the oscilloscope. RS-232C is used for communication between oscilloscope and peripheral 
terminals at relatively slow transmission. Alphariumerio communication is most frequently 
done using 7-bit ASCII code. Generally ASCII can be transmitted in either parallel 8-bit 
group (8 separate wires) or as a serial string of 8 bits, one after the other, over a single line. 
RS-232C was used for serial transmission. Communication between the computer and the 
oscilloscope was done by usiag full handshaking. Since the oscilloscope was working in 
the single triggering mode, the computer sends triggering command to the oscilloscope and 
then wait for a massage from oscilloscope that the waveform has been captured and 
digitized and it is ready for transfer. After transfening the digitized waveform to the 
computer, another triggering command is given for the next waveform. In the measurement 
set up, the waveform was the partial discharge pulses super imposed on a power frequency 
cycle. Tire number of required cycles for measurement could be controlled by the 
computer. 

Since the partial discharge pulses occur at the maximum rising or maximum decreasing 
parts of tlie applied high voltage only and the patterns of the pulses at these parts are 
different, it is required to study each part individually. Therefore, a RC circuit was used to 
shift the triggering signal by 90 degree to locate tire partial discharge pulses around the 
peak of the triggering sine wave. Figs. 3.5a and 3.5b show the position of the partial 
discharge pulses with respect to the applied voltage before and after using the phase 
shifter, respectively. 
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Fig. 3.5a The partial discharge pulses before the phase shifter 
KIKUSUI COR 5502U 



Fig. 3.5b The partial discharge pulses after the phase shifter 
Fig. 3.5: The effect of the phase shifter on the partial discharge measurements 
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3.2.5 Personal computer 

Due to the increasing trend of automatic partial discharge measurements in recent years, 
the use of digital system has become very popular. A personal computer offers the 
opportunity to store the discharge pulse sequence and to process these in the course of time 
or as a function of the power frequency cycle. In the present set-up, Pentium computer 
having 24 MB Ram, 1.6 GB hard desk and 150 MHz speed was used. For the pulses of 
each half cycle transferred to tire personal computer, the maximum value of partial 
discharge pulses has been detected. All the pulses which had magnitude of less than 10 % 
of the maximum pulse magnitude, in the same half cycle, were ignored. Therefore, 
generally each pulse has been represeirted by only its first peak or both the fust and second 
peaks depending upon the relative magnitude of the fust peak and subsequent peaks. 

3.3 Samples 

In order to generate different partial discharge patterns, 6 types of standard defects were 
simulated with the help of simple physical models as shown in Fig. 3.6. 


3.3.1 Glo'vv corona 

This was produced using point and plane electrode system with a gap distance of 1 5 cm in 
atmospheric air. Tlie point electrode was 15 cm long and havmg 1.2 cm diameter. Tire tip 
was 10 cm long cone shape with angle of around 7 . A plane electrode with 7.5 cm 
diameter was used. Partial discharge inception voltage was around 7.5 kV. 


3.3.2 Streamer corona 

This was produced using rod and plane electrodes with a gap of 15 cm m the atmospheric 
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air. The rod electrode was 13 cm long and having 2.5 cm diameter. The tip was hemisphere 
with radius 1 .25 cm. A plane electrode of 7.5 cm diameter was used. The partial discharge 
inception voltage for this electrode system was measured to be around 50 kV. 



Glow corona Streamer corona Surface discharge 



Internal discharge Single protrusion Multi protrusions 


Fig. 3.6: Electrode systems for partial discharge sources created in the laboratory 


3.3.3 Surface discharge in air 

A glass sheet was inserted between a rod and a plane electrode system. The internal 
discharge was eliminated by using a thin layer of transformer oil between the glass and the 
grounded electrode. The inception voltage for this case was very low compared to tire other 
electrode systems. It was about 6 kV. 

3.3.4 Internal discharge 

A cavity was produced by creating a hole in a bakelite sheet of thickness 1.6 mm. Two 
15x15 cm sheets with centric hole were inserted between another two sheets of the same 
dimensions. The diameter of the cavity and its height were 4mm and 3.2mm, respectively. 
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Surface discharge between the bakelite sheets as well as between the sheets and the 
electrodes were prevented by dipping the test sample and the electrodes in transformer oil. 
The oil was not allowed to fill up the cavity. The inception voltage for this cavity was 
measured to be 28 kV. 

3.3.5 Single protrusion 

A small needle was used in electrode system having nearly uniform electric field to 
generate partial discharge in air. Two rounded edge circular plates of 12 cm diameter and 
with 10 cm air gap length were used. Partial discharge inception voltage was measured to 
be 25 kV. 

3.3.6 Multi protrusions 

A number of small needles were used in electrode system having nearly uniform electric 
field to generate partial discharge in air. The same electrode system which was used in the 
smgle protrusion case was used also in this case. Partial discharge inception voltage was 
measured to be around 20 kV. 

Each one of these test electrode systems was subjected to voltage, raised slowly, up to 20% 
greater than the PD inception voltage. The latter being defined as the value of applied 
voltage at which the discharge once initiated do not disappear with time (self- extinction). 
All the plane electrodes used had a curved profile. 

3.4 Conclusion 

In this chapter, the partial discharge measrxring circuit as well as the different electrode 
system used to generate the partial discharge sources have been described. Six different 
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partial discharge sources were generated in the laboratory. These sources include glow 
corona, streamer corona, surface discharge, internal discharge, single protrusion and multi 
protrusions. From each one of these sources, several partial discharge patterns were 
generated. In the subsequent chapters, the textures analysis algorithms will be used to 
examine their ability to distinguish between the different partial discharge sources created 


in the laboratory. 
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Chapter 4 

Classification of Partial Discharge Patterns 
Using Minimum Distance Classifier 


4.1 Introduction 

Chapter 2 gave a review for texture analysis and revealed that texture information, which 
has been used successfully in various applications, has potential for classification of partial 
discharge sources. However, the main questions to be addressed are: 

(i) How effective the texture information can be used for partial discharge sources 
classification? 

(ii) Wliat is the effect of changing the number of cycles on the classification accuracy 
of partial discharge sources? 

(iii) Which features have good discriminating power? 

(iv) The minimum number of features which can be used at a time to achieve a certain 
desired classification accuracy? 

This chapter evaluates the ability of four texture analysis algorithms for classifying partial 
discharge sources. The algorithms which have been examined are the spatial gray level 
dependence method (SGLDM), the gray level difference histogram method (GLDHM), the 
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gray level run length method (GLRLM) and the power spectral method (PSM). Only two 
directions have been used in this thesis; the horizontal and the vertical directions. These 
two directions have been selected in the sense that one can investigate the relationship 
between the adjacent pulses in the same cycle (horizontal direction) as well as between the 
pulses in the same phase window (vertical direction). 

A comparison between the features based on these algorithms and some statistical features 
calculated from the conventional phase resolved patterns analysis method (q-cp-n 
distribution) has been carried out. The optimal features of each of these algorithms have 
been selected according to their classification accuracy. Experimental tests were carried on 
samples with different discharge sources to establish the discrimination power of these 
algorithms using minimum distance classifier [Tou and Gonzalez, 1974]. 

4.2 Selection of the Partial Discharge Patterns 

In order to apply the texture analysis algorithms, the proper image of each defect has to be 
created. This requires a selection of proper sampling technique and the sample size as 
described below. 

4.2.1 Sampling techniques 

The first step in the generation of partial discharge patterns is the selection of only few 
power frequency cycles out of a series of cycles generated through the experiment. In other 
words, sample for each partial discharge source has to be selected. In literature, three 
sampling techniques are available [Bhattacharyya, 1977]. 
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(i) Simple random sampling 

(ii) Stratified random sampling 

(iii) Systematic sampling 

In the first technique, one could measure randomly N cycles to construct a partial discharge 
image. In the second techmque also, the concept of simple random sampling technique is 
used. However, to improve the accuracy to be attained for different types of population, the 
complete data is divided into set of homogenous segments (sub-population) and then 
independent simple random samples are drawn from the individual sub-population. In the 
third technique, the puipose is to spread the sample over the population. This technique 
divides the population into several sub-population and then randomly selects a cycle from 
the first sub-population (say cycle number K) and the cycles having the same number in the 
other sub-populations. Thus, to use the stratified random sampling and systematic sampling 
techniques, the size of population should be known. In the case of partial discharge 
patterns, the size of population (number of cycles which can be measured) is infinity. So, 
simple random sampling technique is considered to be the most suitable choice for this 
case. In the experiment, N cycles were selected and measured randomly. However, it 
involved a constant time lag equal to the digitising time and the time of transferring the 
data to personal computer. 

If the partial discharge measurements are taken through iV cycles of the applied power 
frequency high voltage waveform and each cycle is divided into M windows, an image of 
dimension NxM pixels is obtained. Replacing the gray level values used for images 
interpretation by the magnitude of the partial discharge pulses, tlie texture analysis 
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algorithms can be applied to extract discriminating features for different partial discharge 
sources. In other words, the magnitude of each pulse can be compared with the magnitude 
of the adjacent pulses (right and left) in the same cycle as well as with the magnitude of 
pulses in the same window (phase angle position) in the other cycles (up and down). Thus 
only two directions are required, the horizontal (in the same cycle) and the vertical 
(different cycles). In the present work, total 4096 windows were selected in each cycle and 
the number of cycles (N) were varied between 2 and 32 for each partial discharge source. 
Since the behaviour of partial discharge is different in the positive and negative half cycles, 
the texture features were calculated for each half cycle individually. 

4.2.2 Training data refinement 

For each technique, all features have been calculated for a sample size of 30 patterns 
belonging to each class or partial discharge source by using number of cycles varying 
between 2 and 32. The mean values and the standard deviations of all the features have 
been calculated for each technique individually. Sometimes it is possible that, in the 
training data, some aberrant patterns are included. Automatic methods to remove such 
patterns have been suggested (Buttner et al, 1989). In the present investigation, this 
refinement to the training patterns was performed by removing those patterns whose 
distance to the actual mean was greater than a predefined limit. In fact, such patterns 
correspond to a low probability of belonging to a class defined by its mean and covariance 
matrix. Thus an pattern characterised by feature j is removed from the training data if 
the following inequality is fulfilled. 
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where 


fij is the mean value of feature j 

(jj is the standard deviation of feature j 

K is speciJSed parameter for cut-off limit 


In the present study K was taken as 3. Table 4.1 shows a typical results for GLDHM 
features calculated for two cycles. Similar tables were generated for each technique and the 
five cases corresponding to 2, 4, 8, 16, 32 cycles. 


Table 4.1 Mean and standard deviation of GLDHM features 


Features 

with out refinement 

with refinement 

Mean 

value 

SD 

no. of 
patterns 

Mean 

value 

SD 

No. of 
patterns 

1 

28.23897 

3.558595 

30 

27.70894 

2.094585 

29 

2 

93.63261 

2.275512 

30 

93.63261" 

2.275512 

30 

3 

23.02265 

1.878695 

30 

23.02265 

1.878695 

30 

4 

35.45954 

8.339054 

30 

34.06636 

3.422692 

29 

5 

96.77002 

1.177092 

30 

96.77002 

1.177092 

30 

6 

89.91356 

12.8845 

30 

92.03762 

5.635258 

29 

7 

30.0577 

9.975033 

30 

28.27342 

2.0333 

29 

8 

91.88288 

7.293665 

30 

93.20567 

0.853631 

29 

9 

25.49458 

2.700341 

30 

25.1 36371 

1.8882 

29 

■“id 

30.37757 

13.38121 

30 

27.97679^ 

2.522866 

29 

11 

95.70605 

4.711396 

30 

95.7060'5 

4.711396 

— 1 

30 

12 

2.450556 

12.37675 

30 




13 

99.55195 

2.288191 

30 

99.55195 

2.2881 9"l 

30 

14 

0.194182 

0.948212 

30 




15 

3.701664 

18.23956 

30 

-2.15E-05 

1.59E-08 

19 

16 

99.77503 

1.15135 

30 

99.77503 

1.15135 

30 

17 

-71.9008 

47.4012 

30 

-100 

6.37E-08 

22 

18 

3.420175 

18.24341 

30 




19 

98.54305 

7.806394 

30 

99.9681 

0.130966 

29 

20 

0.355971 

1.863411 

30 




21 

3.509818 

18.23434 

30 

-1.02E-05 

1.33E-09 

19 

22 

99.09216 

4.887973 

30 

99.09216 

4.887973 

30 




Classification of PD patterns using minimum distance classifier 


80 


4.2.3 Sample size determination 

The vector of features calculated for each partial discharge source using N cycles represents 
a pattern in the feature space. Each pattern class consists of several patterns to represent 
that class. However, an adequate number of patterns have to be used in each pattern class. 
In determining the sample size (i.e. the number of patterns) in a statistical experiment, the 
following questions should be addressed [Pfaffenberger and Patterson ,1977]: 

(i) How close the estimate is required to be to tire true value of the population 
parameter? 

(ii) How certain is it desired that the estimate will be within the selected number of 
units of the value of the parameter? 

For example, it may be specified that the patterns estimate of the population mean pi 
should not be more than ten units away from the true value of mean p with a confidence of 
95 percent. This implies that one is willing to take a 5 percent chance diat the sample or 
the partial discharge patterns measured of size n will produce an estimated mean pi which 
will be more than ten units away from the actual value p. The confidence coefficient 
depends on how many times of standard deviation are included. The following table shows 
the relationship between the confidence coefficient and the standard deviation a. 


Confidence Coefficient (Z) 

0.90 

0.95 

0.99 

Times of Standard Deviation (a) 

1.64 

1.96 

2.58 


For example, a confidence of 95 percent could be mathematically expressed as: 
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prob{fj^ - 1 .96cr < < /z, + 1 .96cr) = 0.95 (4.2) 

Generally, the standard deviation of the population is unknown. Therefore it is replaced by 
s / where s is the sample standard deviation and n is the sample size. For an error, 
which should not exceed a value (d), one can get the following equation: 



Therefore, 



(4.4) 


This equation determines the required sample size i.e. the number of partial discharge 
patterns which will represent one defect or partial discharge source for a confidence 
coefficient (Z) and the absolute error (p - pi) not exceeding d. In deciding whether to 
compute for 95% confidence interval or for 99% confidence interval, it may seem 
ridiculous to settle for a lower level of confidence when higher level could be obtained. 
However, the old adage “yow must give up something to get something" remain true here 
because a 95% interval extends 1.96ct to either side of the mean pi. Similarly for 99% 
confidence the interval will be 2.58a to either sides of mean pi. This means that the higher 
the desired degree of confidence, the longer the resulting interval. If the length of the 
interval is specified for measuring precision or accuracy, then the confidence level of the 
interval is inversely related to its precision for the same sample size [Devore, 1987]. 
Otherwise, to achieve the same precision the sample size should be increased. For the 
features of the texture analysis algorithms, the variance is not the same. Therefore, it is 
important to select the sample size according to the feature which has maximum coefficient 
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of variation a. The coefBcient of variation of i* feature is its standard deviation divided by 
its mean value as given below: 

(4.5) 

mean{i) 

The final size of the samples in each technique has been calculated according to the feature 
which has the maximum coefficient of variation. For computing the coefficient of variation 
for each feature, the mean and standard deviation values obtained after refinement has been 
used. The sample size has been calculated for confidence intervals 1.64 (90%) and 1.96 
(95%) as well as for 5% and 10% errors firom the means. Tables 4.2 and 4.3 show the 
estimated sample size (number of patterns in each pattern class) for the different techniques 
(GLDHM, SGLDM, GLRLM, PSM, q-(p-n). 


Table 4.2 Sample size for 1.64 confidence interval. 


Techniques 

Number of cycles and acceptable error from the mean value 

2 cycles 

4 cycles 

8 cycles 

16 cycles 

32 cycles 

5% 

10% 

5% 

10% 

5% 

10% 

5% 

10% 

5% 

10% 

GLDHM 

20 

5 

14 

3 

9 

2 

9 1 

2 

10 

2 

SGLDM 

20 

5 

14 

3 

9 

2 

9 

2 

•11 

2 

GLRLM 

5 ^ 

1 

8 

2 

8 

2 

13 

3 

17 

4 

PSM 

672 

168 

177 

44 

84 

21 ^ 

84 ^ 

21 

34 

8 

q-cp-n 

18 

4 

38 

9 

15 
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25 

6 

31 
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Table 4.3 Sample size for 1.96 confidence.interval. 



Number of cycles and acceptable error firom the mean 

value 

Techniques 

2 cycles 

4 cycles 

8 cycles 

16 cycles 

32 cy 

cles 

5% 

10% 

5% 

10% 

5% 

10% 

5% 

10% 

5% 

10% 

GLDHM 

29 

7 

21 

5 

12 

3 

13 

3 

15 

3 

SGLDM 

29 

7 

21 

5 

12 

3 

13 

3 

15 

3 

GLRLM 

7 

1 

12 

3 

11 

2 

19 

4 

25 

6 

PSM 

960 

240 

254 

63 

121 

30 1 

120 

30 

49 

12 

0-(P-N 

25 

6 

55 

13 

22 

5 

36 

9 

45 

11 
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From these tables the following conclusions can be drown: 

(i) A confidence interval of 1.96 and error of 5% from the mean gives a reasonable 
sample size for most of the techniques. 

(ii) For the GLDHM, SGLDM and GLRLM techniques, the mayimnm sample size was 
less than 30 patterns. This is due to the smaller values of standard deviation of the 
patterns in the same pattern class. 

(iii) For PSM the variation between the patterns was very high resulting in large sample 
size. However, this variation reduced with increasing the number of power 
frequency cycles. This is due to the reason that with the increased number of power 
frequency cycles, the information included in each pattern increases. Hence the 
patterns represent that pattern class very well and all the patterns become closer 
which reduce the coefficient of variation. 

(iv) The above conclusion, however, does not seem to be applicable to the other 
techniques. Analysing the features having the maximum coefficient of variation for 
these techniques, it is found that for PSM technique, feature number 3 has 
maximum variation irrespective of number of cycles used. However, for GLDHM 
for example, the features not same and were 9, 9, 3, 9, 4 for 2, 4, 8, 16, 32 cycles 
respectively. These feature results in sample size of 29, 21, 12, 13, 15 patterns. But 
it is interesting to notice that for the same feature like number 9, the sample size has 
decreased as in the case PSM with increasing the number of power frequency 
cycles. It was 29, 21, 13 patterns for 2, 4, 16 cycles respectively. This analysis 
implies that, for the same feature, increasing the number of cycles increase the 
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disenminating mformation which can be extracted by that feature. Similar 
observation have been made for the other techniques. 

From the above results, a sample size of 30 patterns were selected to represent each class 
wliich satisfies requirement of most of the techniques. 

4.3 Minimum Distance Classifier 

The traditional methods for classification mainly follow two approaches: unsupervised and 
supervised. The unsupervised approach attempts to identify a pattern in the feature space. 
However, it may result in groupings that have no clear meaning from the user’s point of 
view. Having established this, the analyst then tries to associate an infonnation class with 
each group. The unsupervised approach is often referred to as clustering. In the supervised 
approach of classification, the user supervises the pattern categorisation process by 
specifying, to the computer algorithm, numerical descriptors of the various partial 
discharge sources. To do this, representative samples of known partial discharge sources, 
called training patterns, are used to compile a numerical interpretation key that describe 
these patterns. Each pattern in the data set is then compared numerically to each category in 
the interpretation key and labelled with the name of the category it looks most like. In the 
unsupervised approach the user defines useful information categories and then examines 
their separability whereas in the supervised approach one first determines separable classes 
and then defines their information. In the present work, to identify the source of particular 
partial discharge, a supervised classification technique has been used to label an unknown 
source by using trained classifier. 
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Classification has been done according to the similarity between the pattern and the pattern 
populations. The distance between any pattern and the pattern populations could be used as 
a measure of similarity in a sense that the smaller the distance, the greater the similarity. 
Classification has been done by using minimum distance (Mahalanobis distance) classifier 
[Tou and Gonzalez, 1974]. The major difference between the Euclidean distance and 
Mahalanobis distance lies in the use, by the latter, of the sample covariance information. In 
other words, Euclidean distance measure creates a circle of points equidistant firom the 
mean value, while the Mahalanobis distance measure creates an elliptical shaped region 
about the mean value. The ellipse is elongated to account for correlation between variables. 
If one feature has a larger variance than the other, it receives less weight. The Mahalanobis 
distance is given by: 

D = {x-mfC-\x-m) (4.6) 

where 

C is the covariance matrix of a pattern population 
m is the mean vector 
X represents an unknown pattern. 

The miaimum distance classifier is a parametric classifier wUch relies on specifying the 
probability distribution of the class values by representative fimction. In this case, the 
function is Gaussian and the mean and covariance are its parameters. The mean vector in 
multidimensional space locates the single peak of the probabiUty distribution and the 
covariance specifies the region of influence of the probability distribution in each 
dimension. Since the normal distribution provides a good approximation to many naturally 
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occurring phenomena, it is expected that the partial discharge classes will also be normally 
distributed. The histograms of all classes by using different features of all the techniques is 
shown in Appendix -1. 

Overall classification accuracy has been calculated from the confusion matrix which 
contains information about the correct classification and misclassification of all classes. 
Confusion matrix is an matrix, where A is the number of classes or partial discharge 
sources. The rows in the matrix represent the assumed true classes, while the columns are 
associated with the classified partial discharge sources. In a confusion matrix, the diagonal 
elements represent the observations which agree with the true classes and the non-diagonal 
entries represent cases where observations do not agree. The degree of agreement is given 
by: 

_ (X„ + + -^ja) 7x 

P A A 

.=1 7=1 



where, N is the total number of elements in the matrix, p is proportion of the patterns that 
agree, and Xu is the diagonal elements of the ^4x4 confusion matrix. For 100 percent 
correct classification this matrix should be diagonal. 


4.4 Partial Discharge Sources Classification Results 

The main aims of the study presented in this chapter have been: 

(i) To investigate the discriminating power of the texture analysis algorithms. 
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(ii) To identify the features which have the best discriminating power. 

(iii) To deterrrune the minimum number of features which can be used at a time to give 
an acceptable classification accuracy. 

(iv) To investigate the effect of changing the number of cycles used to construct the 
partial discharge patterns on the classification accuracy. 

To achieve the first three aims, the classification of PD sources has been done in three 
stages, the first by using the full set of features, the second by using each feature in 
sequence to determine the optimum features, and the last stage by using combinations of 
two features to select the best combinations. Achieving the third aim was relatively 
difficult due to the correlation between the features. One has to investigate all the possible 
feature combinations. Investigations were carried out to determine the classification 
accuracy of each method as well as the effect of changing the number of cycles in each 
pattern on the classification accuracy. For the cases of 2, 4, 8, 16 and 32 cycles, 30 partial 
discharge patterns were used for training and testing the minimum distance classifier. As 
shown in Fig. 4. la, it was found that: 

(i) AH the techniques achieved 100 % classification accuracy for different number 

of cycles. 

(ii) PSM has 96.66% classification accuracy for the case of 2 cycles and it increased 
to 1 00% for the higher number of cycles. 

(iii) Increasing the number of cycles has no effect on the classification accuracy of 


the different techniques. 
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(ii) To identify the features which have the best discriminating power. 

(iii) To determine the minimum number of features which can be used at a time to give 
an acceptable classification accuracy. 

(iv) To investigate the effect of changing the number of cycles used to construct the 
partial discharge patterns on the classification accuracy. 

To achieve the first three aims, the classification of PD sources has been done in three 
stages, the first by usmg the full set of features, the second by using each feature m 
sequence to determine the optimum features, and the last stage by using combinations of 
two features to select tlie best combinations. Achieving the third aim was relatively 
difficult due to the correlation between the features. One has to investigate all the possible 
feature combinations. Investigations were carried out to determine the classification 
accuracy of each method as well as the effect of changing the number of cycles in each 
pattern on the classification accuracy. For the cases of 2, 4, 8, 16 and 32 cycles, 30 partial 
discharge patterns were used for training and testing the minimum distance classifier. As 
shown in Fig. 4. la, it was found that: 

(i) All the techniques achieved 100 % classification accuracy for different number 

of cycles. 

(ii) PSM has 96.66% classification accuracy for the case of 2 cycles and it increased 
to 1 00% for the higlier number of cycles. 

(iii) Increasing the number of cycles has no effect on the classification accuracy of 
the different techniques. 
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In fact, by changing the number of cycles, the texture features’ values have changed. 
However, the pattern classes still separate and thus resulting in high classification accuracy. 
To investigate the ability of features created firom the texture analysis algorithms to 
generalise the extracted information from the partial discharge patterns, different patterns 
were used for training and testing of the minimum distance classifier. In other words, 
patterns created from different number of cycles were used for testing and training. For the 
different techniques, the patterns created using 2, 4, 8, 16 and 32 cycles, 30 patterns each, 
were used for the minimum distance classifier training, whereas the testing has been done 
by the 30 patterns created from the case of 32 cycles only. The classification accuracy for 
this case is shown in Fig. 4. lb. From this figure, it can be observed that: 

(i) SGLDM and GLDHM are having almost the same classification accuracy 
for the different number of cycles used for train ing. 

(ii) With increasing the difference between the number of cycles used for 
training and testing, the classification accuracy of the different techniques is 
reduced. 

(iii) Reducing the number of cycles used for training has small effect on the 
classification accuracy of the SGLDM and GLDHM. Whereas, GLRLM and 
PSM are more affected by reducing the number of cycles. 

(iv) The features created from the phase resolved pulse height analysis (q-<j)-n 
distributions) are highly affected by increasing the difference between the 
cycles used for training and testing. 
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It is clear from Fig. 4.1b that the texture analysis algorithms are able to extract the 
discriminating information between the different partial discharge sources. This 
information slightly depends upon the number of cycles used to construct the partial 
discharge patterns. However, this property is not valid for the conventional methods (q-(|)-n 
distributions). 



Fig. 4.1a: Classification accuracy of different techniques using 
the same patterns for training and testing 



Fig. 4.1b: Classification accuracy of different techniques using 
different patterns for training and testing 
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The classification accuracy shown in Fig. 4.1a and 4.1b were achieved by using all the 
features in each technique. The large number of features, however, make the classification 
process quite time consuming. Therefore, the features which have little contribution to the 
classification process can be discarded. The process of features selection can be done 
according to the relative classification accuracy of the features. This has been established 
for the four texture analysis algorithms in addition to the (j)-q-n distributions. For each 
pattern class, 15 patterns were used for training and 30 patterns (including the training 
patterns) were used for testing. All patterns were created by using the same number of 
cycles for training and testing. 

4.4.1 Spatial gray level dependence method (SGLDM) 

As mentioned in chapter 2, SGLDM has seven features in each direction. These features 
are mean, energy, contrast, entropy, inverse difference moment, variance and the 
correlation. The scanning of each pattern horizontally and vertically for the positive and 
negative half cycle individually results in total of 28 features. Features 1-7 and 8-14 
correspond to the horizontal and the vertical directions for the positive half cycle 
measurements. Whereas, features 15-21 and 22-28 correspond to tlie horizontal and the 
vertical directions of the negative half cycle measurements. Each one of the 28 features 
were considered individually for training and testing. Fig. 4.2 shows the classification 
accuracy of all the features using the six partial discharge sources discussed in chapter 3. It 


can be observed from Fig. 4.2 that: 
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Fig. 4.2: Classification accuracy of SGLDM features 

(i) The relative classification accuracy of the features depend upon the ntimber 
of cycles used to construct the patterns. 

(ii) Only the feature number 7 (correlation) is almost independent of the number 
of cycles. It has the maximtnn classification accuracy in the case of 4 , 8, 16, 
and 32 cycles varying between 78% to 82%. 

(iii) The disc rimin ative power of the correlation between the pulses is more clear 
in the horizontal direction of the positive half cycle (features 7). 

(iv) The discriminative power of the correlation between the pulses in the 
vertical direction for the positive and negative half cycles is very poor. It 
varies between 8% to 20% with changing the number of cycles for the 
positive half cycle (features 14). For the negative half cycle, it varies 
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between 28% to 35% (feature 28). 

(v) Classification accuracy for the mean (features 1,8, 15, 22) and the variance 
(features 6, 13, 20, 27) are the same for the horizontal and vertical directions 
in each half cycle individually. 

(vi) The classification accuracy due to any feature in the horizontal direction is 
greater or at least equal to its classification accuracy in the vertical direction 
in both the half cycles. 

4.4.2 Gray level difference histogram method (GLDHM) 

In case of GLDHM, as given Lq chapter 2, there are 5 features in each direction for each 
half cycle. These features are mean, energy, contrast, entropy and inverse difference 
moment. For the horizontal and vertical directions of the positive and negative half cycles, 
one gets 20 features. To consider the pulses polarity, another feature was introduced called 
as the polarity factor. This additional feature is the difference between the average of 
positive pulses and the average of negative pulses divided by the addition of the average of 
positive pulses and the average of negative pulses as given below. 


Polarity factor - "^^7 — 

rP rC 


(4.9) 


where n^" and n are the number of positive and negative pulses. The value of this factor 
varies between -1 and 1. It will be positive if the average of the positive pulses is greater 
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than the average of the negative pulses and vice versa. ITiis feature is the same for 
horizontal and vertical directions. Hence, it was considered in the horizontal direction only 
which makes the total features as 22. Fig. 4.3 shows the classification accuracy of GLDHM 
features when considered individually. Features 1-6 were calculated for the horizontal 
direction of positive half cycle and features 7-11 were calculated for the vertical direction 
of the same half cycle. Features 12-17 and 18-22 were calculated for the negative half cycle 
in horizontal and vertical directions, respectively. From this figure, it is clear that: 



Fig. 4.3: Classification accuracy of GLDHM features 

(i) The relative classification accuracy of the features depend upon the number 
of cycles used to construct the patterns. 

(ii) In the positive half cycle, the classification accuracy of the features used in 
the horizontal direction is higher than the classification accuracy of the same 
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features when used in the vertical direction, except the case of 2 cycles. 

(iii) In the negative half cycle, there is no clear relationship between the 
classification accuracy of the features and the direction. 


4.4.3 Gray level run length method (GLRLM) 

In case of the GLRLM, there are five textural features as mentioned in chapter-2 which 
resulted in total of 20 features when calculated for the horizontal and vertical directions in 
the positive and negative half cycles. These features are short run emphasis, long run length 
emphasis, gray level non-umformity, run length non-uniformity and run percentage. As 
shown in Fig. 4.4, features 1-5 correspond to the horizontal direction of positive half cycle 
and features 6-10 correspond to the vertical direction of the same half cycle. Features 11-15 
and 16-20 correspond to the negative half cycle in horizontal and vertical directions, 
respectively. From this figure, the following can be observed: 

(i) The relative classification accuracy of the features depend upon the number 
of cycles used to construct the patterns. 

(ii) The short run emphasis is equally or more effective in classification when 
used in the horizontal direction for both the positive and negative half cycles 
(features 1, 6) in comparison with the same feature when used in vertical 
direction of the positive and negative half cycles (features 6, 16) except for 
the case of 2 cycles. On the other hand the same feature is, in general, more 
effective in the positive half cycle compared to the negative one. 

(iii) The long run emphasis is equally or more effective in classification when 
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used in the vertical direction for positive and negative half cycles (features 
7, 17) in comparison with the same feature when used ia the horizontal 
direction of the positive and negative half cycles (features 2, 12) except for 
the case of 32 cycles. On the other hand, the same feature is generally more 
effective in the negative half cycle compared to the positive one. 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

Features 

Fig. 4.4: Classification accuracy of GLRLM features 


(iv) The gray level non-uniformity is more effective in classification when used 
for the horizontal direction of the positive half cycle (feature 3) or the 
vertical direction of the negative half cycle (feature 18). 

(v) The run length non-uniformity is more effective in classification when used 
for the vertical direction of the positive half cycle (feature 9) except for the 
case of 32 cycles. It is also effective in the horizontal direction of the 
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negative half cycle (feature 14) except for the case of 2 cycles. 

(vi) The classification accuracy of the run percentage is higher if it used in the 
vertical direction of the positive and negative half cycles (features 10, 20) 
except for the case of 32 cycles. 

4.4.4 Power spectrum method (PSM) 

Since the transformed image will have the same size as the original one, in case of PSM the 
number of features is not constant. It depends upon the number of cycles used to construct 
the partial discharge patterns. For example. Fig. 4.5 shows the power spectrum of a glow 
corona. Actually this figure was calculated for 128 cycles with 2048 phase windows for 
each cycle. The transformed image was adjusted to bring the average value at its centre, so 
that the image is symmetrical around its centre. Since the variation being only around the 
centre of the image, for clarity only 32 cycles by 32 phase windows firom the centre of the 
full transformed image are shown m Fig. 4.5. The features generated from the transformed 
image are the average values of number of slits (say FT) in the horizontal or vertical 
directions. Since there are 2048 phase windows per cycle, the number of features in the 
horizontal direction are quite large and depends on fte step size H. In this study, it was 
selected to be fixed and equal to 6 for all the number of cycles. The number of features in 
the vertical direction were varying according to the number of cycles. Therefore the 
number of features considered were 14, 14, 16, 18 and 24 for 2, 4, 8, 16 and 32 cycles, 
respectively. Half of these features were calculated from the transformed image for the 
positive half cycle while the other half from the negative half cycle. In each half of these 
features, the last six axe those which were calculated for the horizontal direction. The 
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classification of these features are shown in Fig. 4.6, which provides the following 
observations: 



Fig. 4.5: Fourier transform of Glow corona 


(i) The relative classification accuracy of the features depend upon the number 
of cycles used to construct the patterns. 

(ii) The features which are calculated from the horizontal slits (features 1, 1, 1- 
2, 1-3, 1-6 for 2, 4, 8, 16, 32 cycles, respectively) are generally more 
effective in the classification especially those which are nearer to the image 
centre. 

(iii) The features which are calculated for the central slits in the vertical 
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direction (features 2, 9 for 2, 4 cycles, features 3, 11 for 8 cycles, features 4, 
13 for 16 cycles and features 7, 19 for 32 cycles) has the lowest 
classification accuracy. 

(iv) The classification accuracy of any feature in the negative half cycle (the 
second half of features) is greater than the corresponding feature in the 
positive half cycle ( the first half of features) except for the case of 32 cycles 
it is almost same. 



4.4.5 q-(i>-n Distributions 

In case of q-(j)-n distributions, there are 7 features for each pattern as given in chapter 1. 
These features are the skewness, the kourtosis, the number of peaks for each half cycle in 
addition to the correlation between the positive and negative half cycles. These features 
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were calculated for three distributions and thus resulted in total of 21 features. Features 1-7 
correspond to the number of pulses in each phase window (n-^) distribution), features 8-16 
correspond to the average value of partial discharge pulses in each phase window (qav-<j) 
distribution) and features 15-21 correspond to the maximum pulse magnitude in each phase 
window (qmax-4> distribution). Fig. 4. 7 shows the classification accuracy of these features. 
It can be seen from this figure that: 



Fig. 4.7: Classification accuracy of q-(p-n features 


(i) The first distribution (n-(j) distribution) has very poor classification accuracy 
(features 1-7) compared to the other distributions. The maximum 
classification accuracy of all the features calculated from this distribution 
was less than 55% 

\ 

(ii) The classification accuracy of the correlation between the positive and 
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negative pulses was very poor in the first and second distribution (features 
7, 14) while it slightly unproved in the third distribution (feature 21). 

(iii) The third distribution (qmax-<{> distribution) has the maximum classification 
accuracy (features 15-21). 

(iv) For the positive half cycle, the skewnwss (features 1, 8, 15) is more 
effective in classification as compared to the kourtosis (features 2, 9, 16). 

4.4.6 Effect of increasing the number of features used for classification 
From tire results of the classification accuracy shown above for the different features 
individually, it is clear that considering one feature at a time, in any of the techniques, is 
not sufficient to get a reasonable classification accuracy. In fact the best classification 
accuracy achieved with all the methods was below 85% except with the GLRLM which 
slightly exceeded 90%. Hence, it is important to consider more than one feature at a time to 
improve the classification accuracy. Because of the correlation between the different 
features, one can not directly select the best two features in each technique and use them 
together as the best combination to increase the classification accuracy. Therefore, the best 
combinations have been established with each technique by investigating the 
discriminating power for all possible combinations of two features. The results of the 
classification accuracy were obtained with all the texture analysis techniques for 2, 4, 8, 1 6 
and 32 cycles. However, the classification accuracy of all the combinations using patterns 
constructed from only two cycles for the SGLDM, GLDHM, GLRLM, PSH and q-f n 
distributions are shown in Figs. 4.8 to 4.12, respectively. Since the main objective is to 
determine the combinations which have the best classification accuracy, the first twenty 
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combinations are shown for each technique. 



Fig. 4.8: Classification accuracy of SGLDM with two features combination 



Fig. 4.9: Classification accuracy of GLDHMwith two features conrbination 
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4.10. Clfissification accuracy of GLRLM witli two features combinations 



Fig. 4. 1 1 : Classification accuracy of PSM with two features combinations 
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Fig. 4.12: Classification accuracy of q- 9 -n with two features combinations 

It is interesting to note that by considering combination of two features at a time, the 
classification accuracy increased close to 99% in all the methods except for PSM where it 
was 95% and for (j)-q-n where it was 96%. In the case of SGLDM, the combinations of the 
features (4,20) and (4,27) provide a classification accuracy of 98.89%, whereas in case of 
GLDHM the combinations of the features (1,17) and (7,17) provide a classification 
accuracy of 99.44%. For GLRLM, the combinations (3,10), ( 6 , 10 ), (7,10), ( 8 , 10 ) and (9,10) 
provide classification accuracy of 100%. For PSM, the best combination was features (1,8) 
with classification accuracy of about 95%. In case of q-(t)-n, the best combinations were 
features (13,17), (17,20) with classification accuracy of 96% and 95.5%, respectively. 
Hence, only two features are sufficient to achieve a reasonable classification accuracy. The 
features of best combinations can be related to the horizontal direction, horizontal- vertical 
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directions or vertical direction. For q-t^-n, the combination (17,20) is related to 
pulse magnitude distribution. It is clear, from the above results, that there is no need to use 
combinations of more than two features. If more number of features are used, the 
computational time will increase without any substantial gain in the classification accuracy. 

4.5 Conclusion 

In this chapter four texture analysis algorithms were investigated for partial discharge 
sources classification. These techniques are spatial gray level dependence method, gray 
level difference histogram, gray level run length method and power spectrum method. The 
conventional method (q-cp-n distribution) also has been used to classify the same partial 
discharge sources. Mmimum distance classifier has been used as supervised classifier to 
label unknown partial discharge source. From the work carried out in this chapter, the 
following conclusions can be drawn: 

1 . Texture analysis algorithms have high discriminative power and can be used for partial 
discharge sources classification. With only one feature, the classification accuracy is 
relatively lower (maximum 90%) which improves by considering more number of 
features at a time. 

2. Combination of only two features are adequate for achieving high classification 
accuracy in all the techniques. The classification accuracy using combinations of two 
features are close to 99% in three of the investigated techniques. 

3. These combinations consisted of features related to the horizontal direction, vertical 
direction and also mixed from both the directions. 
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4. In comparison with the conventional method (q-tp-n distribution), the texture analysis 
algorithms axe more effective in extracting discriminating information from the partial 
discharge patterns especially when used to classify partial discharge patterns which are 
not used for the training of the minimum distance classifier. 

5. The relative classification accuracy of the features depends upon the number of cycles 
used to construct the partial discharge patterns. 

6. Power spectrum method is relatively complicated as it involves complex numbers for 
computation rather than real numbers. The time of calculation with this method is quite 
long and requires more computer memory space for storing the original and the 
transformed images. 

7. Gray level dependence method and gray level difference histogram method do not differ 

much in their discriminating power. However, the one dimension gray level difference 
histogram method (GLDHM) performed well and yielded comparable classification 
accuracy with the other computationally more expensive texture calculating 


algorithms. 
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Chapter 5 

Feature Selection Using Transformed Divergence Analysis 


5.1 Introduction 

Classification time increases with the number of features used to describe any class or 
partial discharge source. Removal of least effective features in classification is referred to 
as feature selection or feature reduction. Separability analysis is a basic approach used to 
identify subset of features within large number of features to facilitate the more efficient 
display and classification. Several measures of separability are available to predict the best 
features for classification[Tou and Gonzalez, 1974]. These measures are based on 
statistical distance between different classes. Distance between mean values of feature sets 
may be insufficient since overlap region is also influenced by the standard deviation of the 
distributions. Hence, a combination of both the distance between the mean values and a 
measure of standard deviation is required. Bhattacharyya distance, divergence, transformed 
divergence, and Jeffreys-Matusita distance can be used for this purpose. However, 
transformed divergence, and Jeffieys-Matusita distance have been found to have similar 
performance and better than the Bhattacharyya distance and the divergence measures 
[Mausel et al, 1990]. Since Jefi&eys-Matusita distance is computationally more complex 
[Richards, 1993], the transformed divergence analysis has been used in this chapter. The 
main objective of this chapter is to apply the transformed divergence analysis to determine 
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the best feature or the best combination of more than one feature at a time which ensure the 
distinct separation between different partial discharge sources. 


5.2 Divergence Concept 

Divergence is a measure of distance or dissmnlarity between two classes. It can be used to 
determine feature rankmg and to evaluate the effectiveness of class discrimination [Tou, et 
al 1974]. Let the probability of occurrence of pattern x, given that it belongs to class w„ be 

Pi(x)=p(x/w;) (5.1) 

And the probability of occurrence of pattern x, given that it belongs to class Wj, be 

Pj(x)=p(x/Wj) (5.2) 

The discriminating information for class w, versus class Wj may be measured by the 
logarithm of the likelihood ratio. 


=ln 


pX^) 

pM) 


(5.3) 


The average discriminating information for class w, is then given by 


Pjix) 


(5.4) 


The discriminating information for class wj versus class Wj may be measured by the 
logarithm of the likelihood ratio 


, pM) 

=ln-^ 

Pi{x) 


(5.5) 


The average discriminating information for class wj is then given by 

lUX) = It?/ W x (5.6) 


The total average information for discriminating class Wi from class wj is often referred to 
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as divergence, which is given by 

'^ij = l[P,{^)-Pj{x)]x]xi^^dx 

Pj(x) 


(5.7) 

(5.8) 


Suppose, one has two pattern classes characterized by two n-variate normal populations 

N(m,,CJ and N(mj,Cj) 

Where, mj and mj are the mean vectors and Ci , Cj are n x n covariance matrices. The 
population densities are given as. 


P^(~^) = - T ,w, exp[-T-^ - 




(5.9) 


and 


Pj(x)=— ^ ^ 


( 2 ; 2 r)"''C^. 


1/2 


exp[--(^-7n^.yC/ {x-m^)] 


From this, the logarithm of likelihood ratio can be obtained as 

1 |C; 1 , 1 , 


The average information for discrimination between these two classes will be 

Khj) = \Pi(xy,x^, dx„ 

J Pj{x^,x^, ,x„) 


(5.10) 


(5.11) 


(5.12) 



1 

Q 

1 


(5.13) 


Hence, the divergence for these two classes can be expressed as 
J, = \)-P,(.V2 *■ 


or 
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2 


\tr[(C^ -C^XC; -C,-*)] + irr[(Cr’ +C/')(m, -m^Xm, 


■m,y] 


(5.15) 


where, tr is the trace of the matrix, and the subscript t denotes the transpose of the matrix. 
The divergence possesses the following useful properties [Tou and Gonzalez, 1 974] : 


1. Jij> 0 for i 

2. Jij = 0 for i = j 

3. Jjj ~ Jji 

4. Jy is additive for independent features 

m 

-^m) (5.16) 

i=l 

5. Adding a new feature never decreases the divergence 

,xj < J,_(xi,X 2 , >x„,x„_^) (5.17) 

The additive property of divergence implies that, if the features are independent, the 
divergence based on m features is equal to the sum of the m divergences based on each 
feature separately. This property may be utilized to determine the relative importance of 
each of the features to be selected. The features which will lead to large divergence 
between the classes are more important, since they carry more discriminatory information. 
Thus one may rank the importance of each feature according to its associated divergence. 
However the use of divergence may lead to the following problem in some cases. 

If divergence is plotted versus distance between classes it increases quadratically. This 
behavior, unfortunately, is quite misleading if divergence is to be used as an indicator of 
how successfully patterns could be discriminated or classified. It implies that at large 
separations, further increase in the separation will lead to better classification accuracy. 
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whereas in practice this is not the case in probability of correct classification. Moreover, 
easily separable classes will weight average divergence upward in a misleading fashion to 
the extent that sub-optimal reduced features subset might be indicated as the best. This 
drawback of divergence method can be overcome by using the transformed divergence 
[Richards, 1993] described in the next section. 

5.3 Transformed Divergence 

A useful modification of divergence to over come its drawback is to use the transformed 
divergence. The transformed divergence is given by 

= (5.18) 

The presence of exponential factor gives an exponentially decreasing weight to increasing 
separation between the partial discharge classes similar to that expected for the probability 
of correct classification. In other words, it will have a saturating behavior with increasing 
separation of the classes. It is asymptotic to 2.0. Thus a transformed divergence of 2.0 
between partial discharge sources would imply that the classes of these partial discharge 
sources can be identified with 100% classification accuracy. This saturating behavior is 
highly desirable since it does not suffer the difficulty experienced with divergence method. 

5.4 Use of Transformed Divergence for Feature Selection - 

Consider a case of M partial discharge sources (classes), each having total N features, and 
there is a need to select the best « features subset. The transformed divergence between 
each pair of classes have been determined for all combinations of n features out of the A 
features. An average indication of separability is then given by computing the average 


transformed divergence. 
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M M 

(5.19) 

■=i j=i 

where p(wj and p(wj) are the class prior probabilities. 

5.5 Results and Discussion 

In this study, six different partial discharge sources have been used which were created in 
the laboratory as described in chapter-3. The four texture analysis algorithms, gray level 
difference histogram method, spatial gray level dependence method, gray level run length 
and the power spectrum method as well as the q-(p-n distribution have been used to 
generate the features used to calculate the average transformed divergence. Using each 
feature individually of these algorithms, the transformed divergence between every pair of 
classes was calculated. The average transformed divergence between the six classes for 
each feature of the four texture analysis algorithms and the q-cp-n distributions is given in 
the following sub-sections. 

5.5.1 Grey level difference histogram method 

Gray level difference histogram has five features (fl, f2,....,f5) plus the feature of polarity 
factor (f6) as described in chapter 2. Using these features for horizontal and vertical 
direction of the positive and negative half cycles, the average transformed divergence 
between the six partial discharge sources is shown in Fig. 5.1 for different number of 
cycles. Fig. 5.2 shows the effect of changing the number of cycles on the discriminating 
power of the features when used in the horizontal and vertical directions of the positive and 
negative half cycles, respectively. The number of cycles were 2, 4, 8, 16, 32 as given in 
Fig. 5.2 and denoted as A, B, C, D, E, respectively. In this figure, the symbols A, B, C, D, 
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E were repeated four times, the first one for the features applied on the horizontal direction 
of the positive half cycle, the second one for the features when applied for the vertical 
direction of the positive half cycle while the third and fourth for the features when applied 
for the horizontal and vertical directions of the negative have cycles, respectively. From 
figures 5.1 and 5.2 it can be noticed that: 

(i) Feature number 17 has the maximum separation between the partial 
discharge sources with average transformed divergence values of 1.99, 1.99, 
1.88,1.83 and 1.75 for 2, 4, 8, 16 and 32 cycles, respectively (Fig. 5.1). 

(ii) While the discriminating power of feature niunber 17 is decreasing with 
increasing number of cycles, it still has the maximum discriminating power 
compared to the other features except in the case of 32 cycles (Fig. 5. 1). 

(iii) In the positive half cycles, the features describing the horizontal direction 
have almost the same average transformed divergence as the features 
describing the vertical direction. For a given direction and number of 
cycles, the features fl, f2, f4 and fS have the same average transformed 

divergence (Fig. 5.2). 

(iv) In the negative half cycle, for a given direction and a number of cycle, the 
features £2, f4, f5 have the same average transformed divergence (Fig. 5. 2). 

(v) In the same half cycle, increasing the number of cycles has the same effect 
on the features used with the horizontal and the vertical directions (Fig. 


5.2). 
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Fig. 5.1 : Average transformed divergence between the six partial discharge sources 

using the GLDHM features 



Fig. 5.2 : Effect of the number of cycles on the average transformed divergence 
between the six partial discharge sources using the GLDHM features 
(A-2 cycles, B-4 cycles, C-8 cycles, D-16 cycles, E-32 cycles) 


5.5.2 Spatial gray level dependence method 

Spatial gray level dependence method has seven features (fl, f2,....,f7) as described in 
chapter 2. The transformed divergence of these features, when applied for the positive and 
negative half cycles in the horizontal and vertical directions, are shown in Fig. 5.3. The 
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effect of increasing the number of cycles on the discriminating power of these features 
when used in the horizontal and vertical directions of the positive and negative half cycles, 
respectively, is shown in Fig. 5.4. From these figures, it can be observed that: 

(i) Correlation between the partial discharge pulses using the vertical direction 
of positive half cycle (feature number 14) and negative half cycles (feature 
number 28) has the worst discriminating power (Fig. 5.3). 

(ii) Except the correlation, in each half cycle, the effect of increasing the 
number of cycles on the relative discriminating power is the same in the 
horizontal and vertical directions (Fig. 5.4). 



CN( (N 


Feature number 


-♦ — 2 cycles 
-m—4 cycles 
-ir— 8 cycles 
-^<—16 cycles 
- 5 (e— 32 cycles 


Fig. 5.3 : Average transformed divergence between the six partial discharge sources 

using the SGLDM features 





Transformed divergence analysis 


115 



Fig. 5.4 : Effect of the number of cycles on the average transformed divergence 
between the six partial discharge sources using the SGLDM features 
(A-2 cycles, B-4 cycles, C- 8 cycles, D-16 cycles, E-32 cycles) 


(iii) In both the directions of the positive half cycle, the discriminating power of 
the features (fl, f3, f6) increases with increasing the number of cycles (Fig. 
5.4). 

(iv) In each half cycle, the contrast (f3) and the variance (f6) has the same 
discriminatmg power in both the directions (Fig. 5.4). 

(v) For a given direction and niunber of cycles, the angular second moment 
(fZ), the entropy (f4) and the IDM (f5) have the same discriminatmg power 
(Fig. 5.4) 

5.5.3 Gray level run length method 

The average transformed divergence between the six partial discharge sources using the 
five features of the gray level run length method (fl, f2,...,f5), as described in chapter 2, is 
shown in Fig. 5. Fig 6 shows the effect of increasing the number of cycles on the average 
transformed divergence of the gray level run length method features when used in the 
horizontal and vertical directions of the positive and negative half cycles, respectively. 
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From these figures, the following observations can be made: 



Fig. 5.5 : Average transformed divergence between the six partial discharge sources 

using the GLRLM features 



Fig. 5.6 : Effect of the number of cycles on the average transformed divergence 
between the six partial discharge sources using the GLRLM features 
(A-2 cycles, B-4 cycles, C-8 cycles, D-16 cycles, E-32 cycles) 


(i) Run length non-uniformity (f4) has the worst discriminating power when 
used in the horizontal direction of the positive half cycle (Fig. 5.5). 

(ii) Gray level non-uniformity (f3) has the same discriminating power when 
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used in the horizontal and vertical directions of the negative half cycle (Fig. 
5.6). 

(iii) The run percentage (f5) has the best discriminating power when used in the 
vertical direction of any half cycle irrespective of the number of cycles used 
(Fig. 5.6). 

5.5.4 Power spectrum method 

The number of features, in this method, depends upon the number of power frequency 
cycles used to construct the partial discharge patterns as shown in Fig. 5.7 and as described 
in chapter 2. As shown in Fig. 5.8, for a given polarity (positive or negative half cycle) and 
number of cycles, the features derived from the horizontal direction have the same 
discriminating power. From Fig. 5.8, the following observations can be made: 



Fie 5 7 • Average transformed divergence between the six partial discharge sources 
■ using the PSM features 
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Fig. 5.8 : Effect of the number of cycles on the average transformed divergence 
between the six partial discharge sources using the PSM features 
(A-2 cycles, B-4 cycles, C- 8 cycles, D-16 cycles, E-32 cycles) 


(i) Increasing the number of power frequency cycles improves the 
discriminating power of the features derived from the horizontal or vertical 
directions in the positive half cycle. However, an opposite effect has been 
noticed in the negative half cycle (Fig. 5.8). 

(ii) With lower number of cycles, the features derived in both the horizontal and 
vertical directions in the negative half cycle are much better as compared to 
the corresponding features in the positive half cycle (Fig. 5.8). 

(iii) Feature number 6 has the best discriminating power when derived from the 
vertical direction of the negative half cycle. However, its effect is very poor 
when derived from the vertical direction of the positive half cycle (Fig. 5.8). 

(iv) Feature number 1 has a very poor discriminating power when derived from 
the vertical direction of the negative half cycle (Fig. 5.8). 
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5.5.5 q-(p-n distributions method 

From the three dimensional distribution q-(p-n, three two dimensional distributions have 
been derived. These distributions are number of pulses vs. phase angle (n-cp), average 
partial discharge pulse magnitude vs. phase angle (ave.-<p) and the maximum partial 
discharge pulse magnitude vs. phase angle (max.- (p). Each distribution has seven features 

(fl, f 2 , ,f7) as described in chapter 1 . Fig. 5. 9 shows the discriminating power of the 

different features for different number of cycles and Fig. 5.10 shows the effect of 
increasing the number of power frequency cycles on the discriminating power of these 
features. The letters A, B, C, D and E, which correspond to the number of cycles used ( 2 , 
4 , 8 , 16, 32 cycles, respectively), were repeated three times corresponding to the number of 
distributions mentioned earlier. From Figs. 5.9 and 5.10, the following observations can be 
made: 



Fig. 5.9 : Average transformed divergence between the six partial discharge sources 

using the q- 9 -n distributions features 
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Fig. 5.10 : Effect of the number of cycles on the average transformed divergence 
between the six partial discharge sources using the q-cp~n distributions features 
(A-2 cycles, B-4 cycles, C- 8 cycles, D-16 cycles, E-32 cycles) 


(i) The correlation between the partial discharge pulses in the positive and 
negative half cycles (f7 in Fig. 5.10) has poor discriminating power in the 
second distributions (feature 14 in Fig. 5.9) compared with the correlation 
in tire first and third distributions (feature 7, 21 in Fig. 5.9). 

(ii) Tire features derived from the third distribution is better than the features 
derived from the other distributions (Fig. 5.9). 

(iii) hr the positive half cycle of the first distribution, skewness (feature 1 in 
Fig.5.9) is better than kourtosis (feature 2 in Fig. 5.9). They have almost the 
same discriminating power in the other distributions (features 8, 9 and 15, 
16 in Fig. 5.9). 

5.5.6 Effect of increasing the number of features used for measuring the separation 
All the above results were obtained using one feature at a time. From these results, the 
maximum average transformed divergence between the classes at different number of 
power frequency cycles were found as given in table 5.1. 
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Table 5.1: Maximum average transformed divergence obtained for the different 
algorithms at different number of power frequency cycles. 



Number of cycles 

2 

4 

8 

16 

32 

MaxTD GLDHM 

1.999 


1.88 

1.83 

1.84 

MaxTD SGLDM 

1.9 


1.78 

1.73 

1.86 

MaxTD GLRLM 

1.86 

1.88 

1.79 

1.90 

1.91 

MaxTD PSM 

1.87 

1.88 

1.77 

1.63 

1.72 

MaxTD q-cp-n 


L69 ' 

1.69 

1.74 

1.83 


The above table reveals the following: 


(i) The features of GLDHM have a good discriminating power for the different 
number of cycles in comparison with the other techniques. 

(ii) The features of q-cp-n distributions have relatively poor discriminating 
power compared to the features of GLDHM . 

(iii) More than one feature has to be used at a time to improve the separability 
between the different classes. 

Hence, combination of two features were used at a time to find the best combination which 
has the maximum discrirninating power. The average transformed divergence between the 
six partial discharge sources were investigated using all possible combmations. The mam 
objectives have been to: 

. check if two features are sufficient for complete separation between the 
classes. 

. investigate the relationship between the direction of features of each 
combination in order to determine the direction {horizontal or vertical) 
providing combinations. 
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To meet the above objectives, the number of combinations having average transformed 
divergence greater than certain values were determined. These values were 2, 1.95, 1.9, 
1.85, 1.8, 1.75, 1.7, 1.65, 1.6, 1.55 and 1.5. The combinations greater than any one of these 
values were divided into three categories: 

• Horizontal (Hor.), if both the features were related to the horizontal direction. 

• Vertical (V er.), if both the features were related to the vertical direction only. 

• Mixed (Mix.), if one feature was related to the horizontal direction while the 
second one was related to the vertical direction. 

For each texture algorithm, the number of combinations in the horizontal, vertical or the 
mixed directions is given as percentage of the total number of possible combinations for 
that algorithm. Figs. 5.11a to 5.1 le show the distribution of the combmations for the 
texture analysis algorithms in addition to the q-(p-n distributions. In each figure, the values 

2, 1.95, ,1.5 have repeated 5 tunes corresponding to the number of power frequency 

cycles used i.e. 2, 4, 8, 16 and 32 cycles. 

Fig. 5.1 la shows the changes in the percentage of combinations, for the different number 
of cycles, with changing the value of the average transformed divergence in the case of 
GLDHM. From this figure, it can be noticed that, the combinations from horizontal 
direction only is greater than the combinations firom the vertical direction. The difference 
between them is almost constant. 

Fig. 5.1 lb shows the results with the SGLDM. The combinations related to the horizontal 
direction are greater than the combmations related to the vertical direction at the higher 
values of the average transfonned divergence values. With decreasing the average 
transformed divergence, the percentage of the combmations related to the horizontal and 
vertical directions become the same. 
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Fig. 5.11c shows the results with GLRLM. In this case, the percentage of the combinations 
in the horizontal and the vertical directions are the same with the different values of 
average transformed divergence as well as with changing the number of cycles. 

In the case of PSM, as shown in Fig. 5.1 Id, it was noticed that there is no combination of 
features related to the horizontal direction at the lower values of the number of cycles. 
However, with increasing the number of cycles the combinations from the horizontal 
direction become comparable to those from the vertical direction. 

In case of q-(p-n distributions, the combinations were divided into six categories consisting 
of combinations from (n-cp) only, combinations from (n-cp) and (ave.-cp), combinations 
from (n-cp) and (max.- cp), combinations from (ave.-cp) only, combinations from (ave.-cp) 
and (max.- cp) and combinations from (max.- cp) only. From Fig. 5. lie, it can be observed 
that the feature combinations from first distribution (n-cp) has poor discriminating power 
compared to the other distributions. Also, it was noticed that the percentage of 
combinations from different distributions is better than the combinations of any single 
distribution. 
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Fig. 5.1 la: GLDHM 
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Fig. 5.11c: GLRLM 
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Fig. 5.1 le: q-cp-n distributions 


Fig. 5.1 1 : Tlie relation between the features belonging to the combinations 
and the direction used to generate these features for different number of cycles 
and different average transformed divergences 

Thus, from Fig. 5. 1 1, the following general observations can be made; 


(i) In all the different techniques, there are some combinations which have an 
average transformed divergence equal to 2. 

(ii) In the case of GLDHM, SGLDM and GLRLM, the number of combinations 
from the horizontal direction are almost equal to the combinations from the 
vertical direction. Also, the number of combinations from mixed features 
have almost the same value for different number of cycles. 

(iii) In the case of the four texture analysis algorithms, the mixed combinations 
are almost equal to the summation of the horizontal and vertical 
combinations. 

,--.A -iXA-th increasing the number of cycles, the total number of combinations 
later than a certain average transformed divergence, increases. In other 
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words, the curve of the total number of combinations reach the level of 
1 00 ^ at higher values of the average transformed divergence. 

(V) Hie combinations of GLDHM, SGLDM and GLRLM methods are much 
better than the PSM and q-cp-n distributions. The percentage of the total 
number of combinations were 100%, 95% and 100% at average transformed 
divergences of 1.85, 1.5, 1.8, respectively in the case of 32 cycles. 

5.6 Comparison of Results With Divergence Method 

For comparing the results, the average divergence between the six partial discharge sources 
was calculated using the GLDHM only as shown in Fig. 5.12. From this figure, one can 
notice that the divergence has no limit. It indicates that most of the features used for the 
negative half cycles has more average divergence compared to the features used for the 
positive half cycle. Also it is giving a misleading result that the feature number 17 (which 
is the best feature as shown in Fig. 5.1) has no discriminating power compared to the other 
features. At the same time it indicate that, for example, feature number 18 has better 
discriminating power compared to feature number 17. Actually, feature number 17 has 
better discriminating power compared to feature number 18 as shown in Figs 5.13, 5.14 
which give the distributions of the different partial discharge classes using features number 
17 and 18. In these figures, 180 patterns were used consisting of 30 patterns for each class. 
As shown in Fig 5.13, all the classes are concentrated around its mean values and there is 
no overlapping between the classes 1, 3, 4, 5 and 6. On the other hand, by using feature 
number 18 there is overlapping between classes 1,2 and 2,3,4 as well as 3,6. 
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Fig. 5.12; Average divergence between the six partial discharge sources 
using the GLDHM features 
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Fig. 5.13: Distribution of the patterns of the six partial discharge sources 

using feature number 17 
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Fig. 5.14: Distribution of the patterns of the six partial discharge sources 

using feature number 18 


Tables 5.2 and 5.3 show the divergence and the transformed divergence between the partial 
discharge sources using feature number 17 while tables 5.4 and 5.5 show the divergence 
and the transformed divergence between the partial discharge sources using feature number 
18. Comparing tables 5.2 and 5.4, it can be noticed that, except the divergence between the 
first class and the other classes, feature number 17 has better divergence between the 
classes compared to feature number 18. But the average divergence between the classes 
using features 17 and 18 are 9.68E+12 and 277E+15, respectively implying that feature 
number 18 is better than feature number 17. This means that if the divergence between 
only two classes, using a certain feature, is very high, the average divergence between all 
the classes will be high and this feature will be considered as the best irrespective of die 
divergence between the otlier classes. It should be noted that the low value of the standard 
deviation of class 1 is the reason of the high value of divergence between that class and the 
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Other classes. 

On the other hand by comparing tables 5.3 and 5.5, it was noticed that the transformed 
divergence between the classes using feature number 17 is much better than the 
transformed divergence using feature number 18. The average transformed divergence 
between the classes using features 17 and 18 were 1.99 and 1.68, respectively. From the 
above, it is clear that the average transformed divergence is more reasonable when used for 
the features selection compared to the divergence. 


Table 5.2: Divergence between six partial discharge sources using feature 

number 17 of GLDHM 



Class 1 

Class 2 

Class 3 

Class 4 

Class 5 

Class 6 

Class 1 

0 

1.52E+13 

2.75E+13 

2.19E+13 

3.23E+13 

3.59E+13 

Class 2 


0 

104.5221 

3482.655 

5628.897 

1377.131 

Class 3 



0 

133.1455 

100.7227 

56.13154 

Class 4 




0 

752.3771 

749.5209 

Class 5 





0 

44.29177 

Class 6 






0 


Table 5.3: Transfomied divergence between six partial discharge sources 
using feature number 17 of GLDHM 



Class 1 

Class 2 

Class 3 

Class 4 

Class 5 

Class 6 

Class 1 

0 

2 

2 

2 

2 

2 

Class 2 


0 

1.999996 

2 

2 

2 

Class 3 



0 

■2 

1.999993 

1.998206 

Class 4 




0 

2 

2 

Class 5 





0 

1.992119 

Class 6 






0 


Table 5.4: Divergence between six partial discharge sources using feature 

number 1 8 of GLDHM 



Class 1 

Class 2 

Class 3 

Class 4 

Class 5 

Class 6 

Class 1 

0 

9.11E+14 

1.13E+15 

1.69E+r4 

3.40E+16 

3.25E+15 

Class 2 


0 

5.5558391 

9.0081 3^1 

58.90788 

5.043419 

Class 3 



0 

8.136919 

149.3223 

4.092518 

Class 4 




0 

421.3721] 

29.73506 

Class 5 





0 

51.7099 

Class 6 





0 

0 
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Table 5.5: Transformed divergence between six partial discharge sonrces 
using feature number 1 8 of GLDHM 



Class 1 

Class 2 

Class'^ 

Class 4 

Class 5 

Class 6 

Class 1 

0 

2 

2 

2 

2 

2 

Class 2 


0 

1-001332 

1.351355 

1.998732 

0 935272 

Class 3 



i 

1.276726 

2 

0.800887 

Class 4 




I 0 

2 

1.951381 

Class 5 





0 

1.996882 

Class 6 







0 


5.7 Conclusion 

In this chapter the average transformed divergence has been used to measure the 
separability between the different partial discharge sources. From the results achieved in 
this chapter, it was found that: 

(i) The GLDHM features have a good discriminating power between different 
partial discharge sources in comparison with the other techniques. 

(ii) The combinations which have features related to the horizontal direction are 
equal or greater than the combinations of features related to the vertical 
directions. 

(iii) Tlie optimum features which give high separation between the different 
classes, according to the transformed divergence analysis (these featoes are 
supposed to give maximum classification accuracy), are slightly different 
than those which are determined according to the mioimum distance 
classification accuracy given in chapter 4. For example, the correlation 
(feature number 7) of the SGLDM, when used with the horizontal direction 
of the positive half cycle the minimum distance classifier gives the 


T'liavtmtim classification accurarv wfipn ncpH xx/ifK 
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transformed divergence, it was not the best. It is suspected that the non- 
Gaussian distribution of some classes is the cause of the disagreement 
between the classification accuracy results and the transformed divergence 
analysis results. 

(iv) By using the transformed divergence, the effect of changing the number of 
cycles on the discriminating power of the features were slightly reduced 
compared with the result of the minimum distance classifier. 
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Chapter 6 

Neural Network Based Partial Discharge Patterns Classification 


6.1 Introduction 

In chapter 4, nxi n imuin distance classifier has been used for partial discharge sources 
classification by using features based on four texture analysis techniques which had offer 
better classification accuracy as compared to the conventional q- 9 -n method. To use the 
minimum distance classifier, like any other statistical classifier, various preliminary 
conditions e.g. data firom normal populations, must be fulfilled in order to carry out the 
analysis [Tou and Gonzalez, 1974]. However, it was observed that some partial discharge 
sources generate classes that deviate from the normal distributions. This deviation creates a 
problem with minimum distance classifier as well as with features selection techniques like 
the transformed divergence analysis. On the other hand the artificial neural networks 
(AjNNs) approach basically belongs to the nonparametric methods. Hence, it is not 
necessary to make any assumption about data stracture. Also ANNs have the ability to 
form non linear decision boundaries between the different classes in the feature space. 
Although the field of ANNs is now a well established area of endeavor, its application to 
partial discharge is relatively very recent. Some of the applications of ANN in the field of 
partial discharge sources classification are as following. 
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A three layer feed forward ANN with back propagation learning algorithm has been used 
for the automatic discrimination of partial discharge, generated in an XLPE cable, from 
noise (Suzuki et al, 1992). Three types of input patterns; (jnq-n, and (j>-q-n were 

used. It was noticed that the ANN could easily discriminate partial discharge from noise by 
using ^h-q-1 patterns. 

ANN using back propagation method has been applied to the discrimination of partial 
discharge patterns before and after the tree initiation from a needle-shaped void [Hozumi et 
al, 1992]. The input patterns were based on the f^-n and (j)-q distributions. Similar 
technique with the same input patterns was used to classify seven different partial 
discharge sources by using three dimensional distributions [Satish et al, 1994]. 

Another approach based on 15 statistical parameters like skewness, kurtosis, number of 
peaks and cross correlation had been used to define input patterns of the ANN [Gulski et 
al, 1993]. Back propagation network, Kohonen self organising map and learning vector 
quantization network were used. The back propagation network provided satisfactory 
results for all the studied partial discharge sources compared to the other types of ANNs. ’ 
Another method which was not based on these distributions but focused on features that 
describe the shape of the partial discharge pulses, namely the apparent charge, rise time, 
fall time, width and area of the partial discharge pulse had also been used [Mazroua et al, 
1993]. These five features cooperate together to form the partial discharge patterns that 
were fed to multilayer ANN trained using the back propagation algorithm. The same 
features and the ANN had been extended for the recognition of discharge sources of 
different types such as cavities and electric trees within the insulation system as well as the 
recognition of changes in the partial discharge shapes associated with the deterioration of 
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ageing effects within the defects undergomg discharge [Mazroua et al, 1995]. A new 
feature given by the product of the actual test voltage and the apparent charge as well as the 
width of the pulse were replaced by the multiplication of the pulse width and the apparent 
charge which resulted in six features. These were used to learn three different ANNs; the 
multilayer perceptron, nearest neighbour classifier and the linear vector quantization 
[Mazroua et al, 1994]. It was found that the recognition capabihties of the three ANNs 
were comparatively equivalent. 

The back propagation feed forward ANN had also been used to discriminate the partial 
discharge of three kinds of electrode systems which resulted in 24 different classes 
[Okamoto et al, 1995]. 

In most of the above work concerning the application of ANN to partial discharge patterns 
recognition, multilayer perceptron with error back propagation as the learning algorithm 
has been used. This has provided a slightly better recognition rate when used for partial 
discharge recognition. 

The main objective of this chapter has been to study the behaviour of a multilayer 
perceptron model of artificial neural network (ANQST) using back propagation algorithm for 
partial discharge sources classification. The features of the four texture analysis algorithms 
as described in chapter 2 have been used to classify the six partial discharge sources as 
described in chapter-3. Detailed studies have been earned out to determine the features in 
each texture analysis method which give maximum classification accuracy as well as 
determining the optimum number of features which can be used for partial discharge 


classification. 
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6.2 Theoretical Background 

In this work, a multilayer feed forward neural network has been used to classify different 
sources of partial discharge. There are certain aspects to be taken into account before one 
can obtain an ANN system that is able to cope with the partial discharge patterns 
recognition task. Essentially there are two stages to be followed: features extraction stage 
and classification stage. The first stage, which already has been done by using the texture 
analysis algorithms, is a step in which an m~dimensional pattern vector is mapped into a 
reduced n-dunensional feature vector, where n is equal to the number of extracted features. 
Tire feature vector are then applied to the ANM to perform the classification stage. The 
main idea behind the classification is to define a boundary surface that divides the features 
space into a number of disjoint regions that represent the different classes. 

6.2.1 Basic architecture 

The ANN mainly consists of processing elements and weighted connections [Haykin, 
1994]. Fig. 6.1 shows an architectural graph of a multilayer feed forward neural network 
with single hidden layer. The neurons or the processing elements in the ANNs are arranged 
into three layers: input layer, output layer and hidden layer(s). The network shown here is 
fully connected in the sense that every node in each layer of the network is connected to 
every other node in the adjacent forward layer. The first layer is the input layer which 
receives respective elements of the input vector. Each neuron in the ANN collects the 
values firom all of its input connections, perform a predefined mathematical operation 
(typically a dot product followed by a processing element function), and produce a single 
output value. The set of output signals of the neurons in the output layer of the network 
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constitutes the overall response of the network to the activation pattem supplied by the 
source node in the input layer. 


Input layer Hidden layer Output layer 



Fig. 6.1 ; A typical multi-layer perceptron model 
6.2.2 The feed forward concept 

In the feed forward neural network, signals can only propagate from the input layer to the 
output layer via one or more hidden layers. Only the nodes in the hidden layers and the 
output layer perform the activation function. Since the nodes in the input layer simply pass 
on the signal from the external source to the hidden layer, for each neuron in the input 
layer, the neuron output are given by: 

Oi=Ni ( 6 - 1 ) 

where Nj is the input to the neuron i and O, the output of the neuron i. For each neuron in 
the output layer, the inputs are given by: 
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^ 2 , ^ 


( 6 . 2 ) 


Where, W;, is the cormection weight between hidden neuron j and output neuron k. Nj and 
Nk are the number of neurons in the hidden layer and output layer, respectively. The initial 
values for the connection weights are pre-set to small random values within a predefined 
range. The processing element activation functions map the processing element inputs 
witliin a prespecified range. Although the possible number of processing element activation 
functions is infinite, in tlris chapter the most common sigmoid function has been used. The 
neuron outputs are given by; 


a 


1 




l + e' 


( 6 . 3 ) 


where, 0k is an externally applied threshold that has the effect of decreasing or increasing 
the net input of the activation function. For the neurons in the hidden layer, the input and 
the output are given by relations similar to that given in equations (6.2) and (6.3), 
respectively. 


6.2.3 The back propagation concept 

Ihe neural network development involves traiimg and testing phases. In the tramiig phase, 
the weights of the ANN are adjusted to map the mput of the system to its output. In the 
testing phase, the ANN should predict the correct system output for a given set of inputs, 
even if these were not used in training. An untrained or poorly trained network will give 
erroneous output. Therefore, as a measure of how a network is functioning during the 
training, the output at the last layer has to be evaluated. There is an algorithm based on the 
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minimisation of the enor fimction on each pattern P by using the steepest descent method. 
The sum of the squared errors E, which is the error function for each pattern is given by: 


where is the target output for output neuron k and O,* is the calculated output for the 
output neuron k. The overall measure of the error for all the inputoutput patterns is given 
by: 

NP 

^ = (6.5) 


where, NP is the number of input-output patterns in the training set. The learning procedure 
used is the back propagation algorithm [Lippmann, 1987]. When an input pattern P with 
the target output vector tp is presented, the connection weights between hidden and output 
layers are updated by using the following equations: 


{p) = ^kj (/?-!) + VSpfipj + ccl^W^ip - 1) 

(6.6) 

^pk ~ ^pk ~ ^pk)^pk (1 — ^pk ) 

(6.7) 


where, t] is the learning rate, and a is the momentum constant. The learning rate controls 
the width of the steps that must be moved from the current position of the error surface 
down to tlie global minimum. A small rj results in a very slow convergence whereas a large 
value of 7 ] will obviously lead to a faster convergence but it may also be accompanied by 
oscillation [Lippmann, 1987]. Momentum is a kind of memory that incorporates the weight 
change of the previous step and in this way slows down useless oscillations. Similarly the 
correction of weight between the input layer neuron i and hidden layer neuron j can be 
updated by using the following equations: 
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The effect of threshold is represented by ndding a new input signal fixed at unity and a 
weight equal to the threshold value. Therefore the thresholds are found exactly in the same 
manner as tire connections weiglits. 


6.2.4 Modified back propagation method 

The back propagation algorithm with sigmoid function, as described above, encounters the 
following difficulty. Wlien the actual output value approaches either extreme value, the 
factor (Op, -l)x Opi in equation (6.7) makes the error signal very small. This implies that 
the output unit can be maximally wrong without producing a strong error signal with which 
the weights could be significantly adjusted. For instant, this occurs when some of the 
output values are pushed towards the wrong extreme value by competition in the network, 
thereby not increasmg their error signal but instead decreasing it. In order to overcome this 
problem, instead of minimising the square of the differences between the actual and target 
values summed over the output units, the minimisation of the following modified error 
function has been used [Van Ooyen et al, 1992]. 

+(l-^„)h(l-0„)] (6.10) 

,fe«l 

The above function resulted in 

Thus, the error signal propagating back from each output unit is now directly proportional 
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to the difference between the target value and the actual value. The modified back 
propagation method based on equation (6.10) has been used in the present work. The 
training data used in this chapter were obtained from the experimental work discussed in 
chapter 3 . Total 1 80 patterns were generated to describe the six classes i.e. for each class 
30 patterns were obtained. Fifty percent of these patterns were used for training, while all 
the patterns were used for testing the ANN. 

6.3 Experimental Design 

The main aim of the this chapter was to apply ANN to investigate the discriminating power 
of the te.xture features in classifying different partial discharge sources. Based on the 
relative discriminating power, the optimum number of features, for the desired 
classification accuracy, could be selected. This has been achieved in two steps. The first 
step utili.sed each feature individually for partial discharge classification. From this step, 
one can determine the maximum classification accuracy, while using only one feature for 
the classification, as well as tire best features for each technique. In the second step, all 
combinations of two features at a time were utilised and the best combination was 
established based on the maximum classification accuracy. Number of the extracted 
features corresponding to the texture analysis techniques were 28, 22, 20 and 14 for 
SGLDM, GLDHM, GLRLM, and PSM, respectively. This involved 84 features for the four 
techniques when used individually plus 890 combinations of two features (378 for 
SGLDM, 23 1 for GLDHM, 190 for GLRLM and 91 for PSM). 

It is well known that the performance of the ANN is controlled by different parameters like 
the learning rate, the momentum, number of neurons in the hidden layer and the number of 
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training iterations. All these parameters depend upon the input data. Therefore, to get the 
best classification accuracy, all these parameters have to be adjusted for the given 
application. The details of the selection of these parameters in the present work are given 
below. 

Since the ANN in the present work has been used as classifier for partial discharge sources 
considering each feature individually, which have different discriminating power, or 
combination of different features, the learning rate has not been fixed. A self adaptation 
method [Battiti, 1989] based on the gradient descent method has been used to adjust the 
learning rate value. One starts with a learning rate 77, in this study 0.3 was selected as a 
starting value. If the error function E to be minimised decreases, one updates pr] where 
/? = 1.1 in each iteration until E increases. One then reduces rj = err] with a = 0.5 and 
iterates until a step when decreasing E is found. 

The number of neurons in the input layer was chosen same as the number of features used 
for the ANN training, in this case either one or two. Six neurons were used in the output 
layer to represent the six possible partial discharge sources. For each partial discharge 
source, only one neuron gives high output (one) and all the other five neurons give low 
output (zero). In the hidden layer, four neurons were used for both the cases of one or two 
input features. Number of neurons in the hidden layer was fixed based on hit and trial and 
was taken as a number resulting in the minimum error and the minimum number of 
training iterations. Also, it was experimentally found from hit and trial that the momentum 
constant value of 0.5 gives satisfactory results. 
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6.3.1 Effect of number of training iterations 

One training iteration is defined as learning all training patterns once. To investigate the 
effect of number of training iterations, the mean of the squared output error has been 
obtained for 1 00 iterations for all the features of the four techniques. Figs. 6.2a to 6.2c 
show the relation between the iteration number and the mean square error for the 22 
features of GLDHM for the case of single feature input. This figure shows that; 

(i) There are two classes of features; one giving almost constant mean square 
error for any number of learning iterations, the second for which the mean 
square error decreases to a certaui value and then remain constant. 

(ii) Ihere is no considerable change in the mean square error value for all the 
features after the first 50 iterations except for a few for which error almost 


saturates in 100 iterations. 

To investigate more deeply the effect of increasing the number of training iterations on the 
classification accuracy of the ANN, the features of every techniques were ranked according 


to tlreir relative classification accuracy. Then the best five features for each technique, 
which have the maximum classification accuracy after 100 learning iterations, were 
retrained by increasing the maximum number of iterations to 1000. Similarly, the ANN 
which was trained using combination of two features, was also retramed. While it was 
trained originally until fte mean square error reduced to 10% or for maximum of 300 
iterations, the retraining of tite ANN was carried out wifl. tire best five combinations of two 
features urrtil the mean square emor reduced to 1% or maximum 3000 iterations, fbe 
testing of the ANN were performed to establish their classification accuracy. The results of 
classification accuracy for the features used individually are shown in table 6.1, and the 
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results with combinations of two features are shown in table 6.2. 
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With increasing the number of iterations with individual features from 100 to 1000, 
training time increased to almost 10 times. However no substantial increase in 
classification accuracy was obtained. In some of the cases, the classification accuracy has 
even reduced. With increasing the number of iterations for the training with the 
combinations of two features from 300 to 3000 iterations, the variation in the classification 
accuracy was even smaller compared to the case of one feature. The variation was within 
5% except for one case which was around 14% for the combination of features 3 & 13 of 
GLRLM. 


Table 6. 1 : Effect of increasing the number of framing iterations of ANN 

with one feature input 



I-or.tinv nnnihci 

Classification 

accuracy 

Mean 

Square error 

100 iterations 

1000 iterations 

100 iterations 

1000 iterations 


17 

79.44 

76.66 

17.8 

15.9 

GLDHM 

21 

57.77 

54.44 

21.1 

22.2 


14 

57.77 

61.11 

21.6 

21.0 


15 

51.66 

48.88 

22.6 

21.4 


12 

51.55 

57.77 

21.5 

20.4 



Feature number 

Classification 

accuracy 

Mean 

square error 

1 00 iterations 

1000 iterations 

1 00 iterations 

1000 iterations 


17 

55 

57.22 

21.7 

20.7 

SGLDM 

24 

54.44 

66.11 

21.8 

19.0 


20 

50 

70 

21.4 

18.6 


18 

46.66 

48.88 

22.2 

21.3 


7 

43.88 

65 

18.6 

19.6 




Feature number 


accuracy 

Mean 

square error 



100 iterations 


100 iterations 

1000 iterations 


10 

66.66 

66.66 


18.5 

GLRLM 

3 

55 

57.77 


18.6 


20 

50 

33.33 

22.16 

22.0 


11 

50 

58.88 

21.4 

21.5 


13 

48.33 

47.22 

22.4 

22.9 


PSM 


Feature number 


10 

14 

5 

7 

13 


Classification accuracy 


100 iterations 


32.22 

28.33 

27.77 
26.11 

22.77 


1000 iterations 


42.22 

28.88 

2.77 

27.77 
25.55 


Mean square error 


100 iterations 


24.9 

25.7 

27.0 
26.2 

24.0 


1000 iterations 
24.1 


24.7 

26.9 

27.1 

24.5 






























Neural Network for Partial discharge Recognition 


145 


Table 6.2 : Effect of increasing the number of training iterations of ANN 
with combination of two features input 



Feature 

Maximum 

iterations 

=300 

Maximum 

iterations 

=3000 



Classification 

accuracy 

Mean 

square 

error 

Actual 

iterations 

Classification 

accuracy 

Mean 

square 

error 

Actual 

iterations 

GLDHM 

9&20 

94.44 

9.61 

100 

95.55 

0.95 

1677 


10&21 

93.33 

11.22 

300 

90.55 

0.99 

1795 


3&20 

92.77 

9.29 

145 

95 

4.49 

3000 


7&17 

92.22 

9.81 

111 

96.11 

0.97 

2383 


4&17 

91.66 

9.47 

22 

96.66 

3.45 

3000 


SGLDM 

Feature 

Maximum iterations =300 

Maximum iterations =3000 

10&24 

10&27 

6<fel7 

13&20 

10&17 

Classification 

accuracy 

Mean 

square 

error 

Actual 

iterations 

Classification 

accuracy 

Mean 

square 

error 

Actual 

iterations 

96.11 

95.55 

93.88 

93.33 

93.33 

9.93 

9.59 

9.90 

9.06 

9.46 

63 

267 

153 

107 

54 

95.55 

96.11 

96.11 

97.22 

93.33 

0.95 

0.83 

6.10 

0.91 

5.85 

1677 

1275 

3000 

660 

3000 





GLRJ.M 

Feature 

Maximum iterations =300 

Maximum. 

iterations 

=3000 

10&13 

3&20 

3&13 

3&5 

10&17 

Classification 

accuracy 

Mean 

square 

error 

actual 

iterations 

Classification 

accuracy 

Mean 

square 

error 

Actual 

iterations 

80.55 

79.44 

75.55 

67.77 

66-66 

12.73 

9.80 

16.83 

17.86 

18.76 

300 

248 

100 

100 

100 

78.33 

83.88 

89.44 

69.44 

66.44 

13.2 
9.27 
11.4 
16.7 

18.2 

3000 

3000 

3000 

3000 

3000 



Since the features, which make the training process faster, were considered as the best 
features based on the above results, the training was perfonned for dte mean square error 
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to be less than 10% or maximum number of 100 iterations in the case of individual features 
and 300 iterations in the case of two features combination. 


6.3.2 Normalisation of inputs and weight initialisation 

Scaling of the input-output data has a significant influence on the convergence property and 
also on the accuracy of the learning process. It is obvious from the sigmoidal activation 
function that the range of output of the network must be within 0 to 1. Moreover the input 
variables should be kept small in order to avoid saturation caused by the sigmoidal 
function. To nonnalise the input data, the maximum value of the input vector components 
were determined as folIowsfEl-Makkawy, 1998]: 

ni,max= max (ni(p)) p=l, NP i=l, N, (6.10) 

where NP is the number of patterns in the training set and N is the number of neurons in 
the input layer. The input data were normalised by this maximum value as the following: 

n-i,nor(p)“’n.i(p) / Ili tnax Nl, Nj (6.11) 

After normalisation, the input variables range is within 0 to 1. The initial weights of the 
ANN were randomly initialised in a range of -0.01 to 0.01. 

6.4 Results and Discussion 

The ANN utilised in this study had only one hidden layer with 4 neurons. The input 
patterns is a vector of length one or two depending on the number of features used for the 
training of the ANN. For both the cases of one or two inputs, the ANN had six output 
neurons corresponding to six different partial discharge sources. It is expected that after the 
ANN has completed its learning phase, even for the unknown test patterns, only one of the 
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six outputs is high or equal to one while the other five outputs are equal to zero. By using 
each feature individually for the training of the ANN, the classification accuracy of all the 
features obtained firom the ANN testing corresponding to the four techniques are shown in 
Figs. 6.3 to 6.6. The classification accuracy has been calculated as the percentage of 
patterns out of the original 1 80 patterns which were classified correctly during the phase of 
the ANN testing. 

From Fig.6.3, it is clear that with SGLDM, the ANN was able to learn better by using the 
features which describe the horizontal direction (features 1-7 and 15-21) than the vertical 
direction (features 8-14 and 22-28) especially in the negative half cycle. The correlation 
between the partial discharge pulses in tire horizontal direction (features 7, 21) has a better 
classification accuracy compared with the same feature when used for the vertical direction 
(features 14, 21). It was also observed that the ANN has learnt better the features which 
describe the negative half cycle (features 15-22) in comparison with positive half cycle 

(features 1-14). 

With the GLDHM, as shown in Fig. 6.4, the maximum classification accuracy was 
obtained by feature 17. This feature is related to the horizontal direction of the negative 
half cycle. In the positive half cycle, the maximum classification accuracy was achieved by 
feature number 1. which is also related to the horizontal direction. In this case also, it was 
noticed that the features which describe negative half cycle (features 12-22) are more 
effective in the classification process compared to the positive half cycle (features 1-1 1). In 
the negative half cycle, the features belonging to the horizontal direction are more effective 
in partial discharge classification compared with the vertical direction. 



Classification accuracy ; i Classification accuracy 
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Fig. 6.3: Classification accuracy of SGLDM features 
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Fig. 6.4: Classification accuracy of GLDHM features 
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With GLRLM, as shown in Fig. 6.5, the vertical direction of the positive half cycle 
(features 6-10) is more effective in the classification process compared with the vertical 
direction of the negative half cycle (features 16-20). However, the horizontal direction of 
the negative half cycle is much better than the horizontal direction of the positive half 
cycle. Tire maximum classification accuracy was achieved by feature number 10. In the 
same cycle, the features which describe the horizontal direction (features 1-5 and 11-15) 
are more effective in classification except features 5 and 15 which have classification 
accuracy less than the features 10 and 20, respectively. 

With PSM, as shown in Fig. 6.6, the classification accuracy was very poor compared to the 
other techniques. The maximum classification accuracy was 32.22% for the feature number 
10 . 

Comparing the classification accuracy achieved by using the features of the texture analysis 
algorithms with the ANH with the results of the minimum distance classifier, it was 
noticed tliat: 

(i) Tlie classification accuracy of the features using the ANN are generally smaller 

than tire classification accuracy of the same features using the minimum 
distance classifier. 

(ii) The ANN could not be trained by using certain features which resulted in zero 
classification accuracy such as, for example, features 1,9 for SGLDM, features 
1 1,13 for GLDHM, features 9,16 for GLRLM and features 1,9 for PSM . 


The poor classification accuracy of the ANN might have resulted fi'om the overlapping 
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between the partial discharge sources in the feature space. To investigate the effect of 
overlapping between the classes on the classification accuracy, the scatter of the six sources 
of partial discharge by using the best and worst features (which having the maximum and 
nxinimum classification accuracy) for GLDHM are shown in Figs. 6.7 & 6.8. The 
horizontal axis is the number of patterns from 1 to 180. Patterns 1 to 30, 31 to 60, 61 to 90, 
91 to 120, 121 to 150 and 151 to 180 belong to classes 1, 2, 3, 4, 5 and 6, respectively. The 
vertical axis represents the values of corresponding feature. It is clear that the classes are 
more separate with feature 17 since there is no overlapping between classes 1, 3, 4, 5 and 6. 
However, tliey become closer with feature 11, since there is clear overlapping between the 
classes 1, 3, 4, 6. Also, there is overlapping between class 2 and 5. For PSM, which gives 
the worst result, the scatter of feature 4 is shown in Fig. 6. 9 which shows that all the 
classes are overlapped. 



Fig. 6.7: Distribution of partial discharge patterns using feature 17 of GLDHM 
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6.8: Distribution of partial discharge patterns using feature 1 1 of GLDHM 



Fig. 6.9: Distribution of partial discharge patterns using feahire 4 of PSM 
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6.5 Effect of Increasing the Number of Features Used for aassification 
By studying the classification accuracy using only one feature at a time, it is observed that 
the maximum classification accuracy achieved is around 80% in the case of GLDHM. 
Hence one feature is not sufficient to achieve a good classification accuracy between the 
partial discharge sources. Therefore, in the next step, the effect of using two features at the 
same time for the training of the AhlN were studied. The classification accuracy of 
SGLDM, GLDHM, GLRLM and PSM by using all the possible combinations of two 
features for each technique were investigated. The most effective twenty combinations of 
features for each technique are shown in Figs. 6.10 to 6.13. 

From Fig. 6.10, one can notice that all the combinations of features, which gave the best 
classification accuracy, generally had one feature belonging to the positive half cycle and 
the second feature to the negative half cycle except the combination (7,11) which belongs 
to the positive half cycle only. It is further noticed that these combinations are equally 
divided into four groups. The first group is related only to the horizontal duection of the 
positive and negative half cycles e.g. (3,17), (3,20), (6,17). The second group is related 
only to the vertical direction of the positive and negative half cycles e.g. (10,24), (10,27), 
(13,27). The third group is combination from the horizontal direction of the positive half 
cycle with the vertical direction of the negative half cycle e.g. (3,27), (4,25), (6,24). And 
the fourth is a combination firom the vertical direction of the positive half cycle and the 

horizontal direction of the negative half cycle e.g. (10,20), (11,18), (13,17). 

From Fig. 6 . 11 , it is clear that all the combinations of features which have the maximum 
classification accuracy are related to the positive and negative half cycles except the 
combination (4,6) in which both features belong to the positive half cycle only. All the 
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combinations were distributed between horizontal-horizontal e.g. (1,17), (4,17), (6,17) 
vertical-vertical e.g. (7,18), (9,20), (10,21), and mixed from both directions e.g. (1,18), 
(3,20), (7,17), (10,12). By using GLRLM, out ofthe best 20 combinations, 8 combinations 
were related only to the positive half cycle, as shown in Fig. 6.12. From Fig. 6.12, it is 
interesting to note that the feature number 10, which was the best when used individually, 
is common in most of the combinations e.g. (3,10), (4,10), (5,10), (6,10). 

With PSM, the situation is slightly different where the combinations were related to 
positive and negative half cycles e.g. (3,12), (3,13) as well as other combinations related to 
the negative half cycle only e.g. (10,11), (10,12). By using combinations of two features, 
PSM classification accuracy has improved but still shows a poor classification accuracy 
compared to the other techniques. The maximum classification accuracy for the 
combinations of two features extracted from the four techniques were 96.11, 94.44, 80.55 
and 52.77 for SGLDM, GLDHM, GLRLM and PSM, respectively. This means that 
SGLDM and GLDHM have good discriminating power compared to GLRLM and PSM. 

To investigate the effect of the overlapping between the classes, the scatter distributions of 
the six classes of partial discharge sources, according to the best and the worst combination 
of two features for GLDHM, are shown m Figs. 6.14 & 6.15. It can be observed that by 
using the combination (9,20), which is the best combination, the classes 1, 2, 5, 6 are 
clearly separable. While using the combination (12,17), only classes 4 and 5 are clearly 
separable. The scatter of the same classes accordmg to the combiiation (3,4), which 
belongs to PSM, is shown in Fig. 6.16. From Fig. 6.16, it is extremely difficult to recognise 
any class because of the high overlapping between the classes which resulted in the poor 
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ig. 6.10: Classification accuracy of SGLDM features combinations 



Fig. 6.11: Classification accuracy of GLDHM features combinations 
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Fig. 6.12; Classification accuracy of GLRLM features combinations 



Fie. 6.13: Classification accuracy of PSM features combinations 
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Fig. 6.14: Distribution of partial discharge patterns using features (9,20) with GLDHM 
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Fig. 6.15: Distribution of partial discharge patterns using features (12,17) with GLDHM 
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Fig. 6.16: Distribution of partial discharge patterns using features (3,4) with GLDHM 


6.6 Conclusions 

In his chapter a multilayer feed forward neural network, based on modified back 
propagation training algorithm, has been used for the classification of different partial 
discharge sources, lire ANN was trained for the partial discharge sources representing 
glow corona, streamer corona, surface discharge, internal discharge, single protrusion, and 
multi protrusions. Based on the results reported in this chapter, following main conclusions 
can be drawn: 


1- The Back propagation ANN with input features derived firom the texture analysis 
techniques are capable of distinguishing between patterns of the six types of simple 
partial discharge sources with appropriate selection of feature combinations for its 
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training. 

2- Since some of the partial discharge sources deviate from normal distribution, it was 
expected that the ANN will improve the classification accuracy. However, it is found 
that the classification accuracy slightly reduced as compared to the minimum distance 
classifier results reported in chapter 4. This may be due to the overlap between the 
attributes of different partial discharge sources. 

3- Only two features are sufficient to obtain acceptable classification accuracy. More than 
two feature combinations will increase the computational time substantially. 

4- Out of the four texture analysis techniques, the SGLDM and GLDHM provide almost 
the same classification accuracy. GLRLM has less discriminative power compared to 
the SGLDM and GLDITM. The PSM provides poor classification accuracy. 
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Chapter 7 

Partial Discharge Patterns Classification for a Practical Case 


7.1 Introduction 

In the previous chapters, the discriminating power of the texture analysis algorithm 
features have been examined by using six standard partial discharge sources created in the 
laboratory with the help of different electrode systems. It was proved that the GLDHM, 
SGLDM and GLRLM are able to distinguish the different partial discharge sources by 
using only two features at a time. It was felt worth examining the discriminating power of 
these features using practical partial discharge sources. In this chapter, a cable test sample 
has been used for this purpose. Both mtemal and surface discharge are expected to be 
available in the cable sample. The classifier were used to distinguish between the internal 
discharge patterns and the internal discharge plus the surface discharge patterns. 

7.2 Background 

In addition to various obvious and by far the most usual reasons for failure of cables, such 
as mechanical damage to the insulation, ingress of moisture, thermal breakdown, and 
breakdown under transient voltage conditions, the partial discharge is also a basic reason 
for its failure[Bungay, 1990]. Cables are usually tested for the presence of the partial 
discharge activity, before installation, to detect any insulation defects that may have 
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occurred during the manufacturing process. Partial discharge tests are also used to detect 
the insulation deterioration in the normal service operating conditions. Partial discharge in 
a cable are caused by the breakdown of the gas contained within voids in the insulation. 
The voids may be either dielectric-bounded or at the interface between dielectric and 
conducting screens or conductor. 

All cables are subjected to changes in load and therefore to temperature cycling during 
operation. The change due to the thermal expansion of both the conductors and the 
insulating material under the metal sheath, for example in paper insulating cables, produce 
small cavities (voids) in the insulation which when of a certain size start to producing 
partial discharge. At this stage not only the dielectric losses are increased but also, where 
high voltage are concerned, the service life of the cable may be reduced. 

For distribution and transmission purposes, impregnated paper cables have had an 
impressive record of reliability since the turn of the century although it is bemg replaced 
these days by new types of insulation such as XLPE. The insulation in paper insulated 
cables consists of helically applied paper tapes. The cables designed with a belt of 
insulation over the laid up cores are the most common type used up to 11 kV. Beyond 11 
kV, screened cables have to be used. Screening consists of a thin metallic layer in contact 

with the metallic sheath. 

The perfect belted cable would be so manufactured that the impreguant would completely 
fill the iitterstices between the conductor, the fiber of papers and the filler matenal. In 
short, the whole volume contained within the lead or aluminum shea* would be 
completely void ftee. During insMation. however. 4e perfect belted cable undergoes 
mechanical manipulation and movement of the cores relative to each other takes place. In 
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addition to temperature cycling, the lead sheath does not return to its original dimensions 
and consequently the interior of the sheath is no longer completely occupied, there being 
voids found within the cable. The voids contain low pressure gas extracted from the 
impregnant. Such voids are particularly hazardous when they occur within the highly 
stressed zones of the insulation. 

The breakdown of paper insulating cable due to partial discharge occurs through the 
formation of carbonaceous on the insulation papers. This is generally known as treeing. 
The carbonaceous paths starts at an almost imperceptible carbon core and gradually spread 
outwards through the insulation. 

7.3 Requirement for Stress Control 

In the region of tlie termination of a cable dielectric screen, it can be seen that not only 
there is an increase in stress within the insulation in that region, but also there is a potential 
gradient along the interface between the dielectric and the surrounding space. The stress in 
the dielectric at the screen termination will be well above the design stress of the cable and 
premature failure can occur at this point. In addition, if the medium surrounding the 
termination is air then the stress in this area may be sufficient for the air to discharge even 
at working voltage. Thus in designing the termination of high voltage cable, it is necessary 
to be aware of both these problems and to include some form of stress control as following. 

7.3.1 Stress cone 

The traditional method of stress relief is the use of stress cone. The stress cone is a means 
of controlling the capacitance in the area of the screen termination, thereby reducmg the 
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stress in the dielectric. The stress cone is continued beyond to the screen termination so as 
to reduce the potential gradient at the surface of the dielectric to a level where partial 
discharge will not occur. 

7.3.2 High permittivity materials 

Materials with relative permittivities significantly higher than the dielectric can provide 
excellent stress control at terminations. When materials of dissimilar permittivities are 
subjected to a potential gradient across tlieir combined thickness, then the material with the 
lowest permittivity will be subjected to the highest stress. This is a physical phenomena 
which enables stress control to be achieved by high permittivity materials. 

7.3.3 Resistive coating 

Stress control can be achieved by the application of a resistive layer to the insulation 
surface at the screen termination. Ideally the layer will pas a small current and will 
therefore set up a linear voltage gradient along its length. However, the resistivity of the 
material has to be within quite a narrow band for the termination to work successfully. If 
the resistivity is too low, then the material will simply act as an extension of the dielectric 
screen and a high stress region will be created at the end of the resistive layer. If the 
resistivity is too high, then the material will have no appreciable effect and the screen 
termination 'will remain a high stress region. 

7.4 Measurement of Partial Discharge in Cables 

Partial discharge measurements in capacitors and short cable lengths, that behave 
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essentially as lumped capacitances, are straight forward. With longer cable lengths, pulse 
superposition due to reflected discharge pulses may lead to considerable error in the 
measurement of both the discharge amplitude and the discharge repetition rate. It is only 
with very long cables, that amplitude superposition errors due to reflection do not arise; 
however, the pulse counting errors remain [Bartnikas, 1990], Partial discharge pulse 
reflection may be easily eliminated by impedance termination of the far cable end with 
resistance R in series with a partial discharge free capacitor C. Here R corresponds to the 
characteristics impedance of the cable and C is equal in value to the blocking capacitor 
used in the RLC detection circuit. 

7.5 Experimental Work 

To investigate the ability of texture analysis algorithms for discrimination between partial 
discharge sources, a 1 1 kV belted cable of 5m lengtli was used. Sme the cable sample was 
very short, the cable end did not require any impedance termination and the far end was 
open circuit. In such sample, the following sources of partial discharge were expected: 

• Internal discharge in the voids between the insulating paper 

• Surface discharge at both the ends between.the core and lead sheath. 

The partial discharge at the cable ends could be eliminated by using stress cone. Two steel 
stress cones were connected to the lead sheath. Both ends of the cable were kept in the 
vertical position and the two cones were filled with transformer oil. Therefore, from the 
cable sample, the following classes of partial discharge could be measured: 

• Internal plus surface discharge at a time. 

• Internal discharge only, by using stress cone to suppress the surface discharge. 
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The high voltage source was connected to the core and the lead sheath was grounded 
through RLC measuring impedance. The measurement set up, shown in Fig. 7.1, has been 
used to measure the partial discharge pulses in this cable. 







( 4 ) 


( 5 ) 


( 6 ) 


Fig. 7.1 : partial discharge measurement circuit 


Where, 


(1) 

Cable test sample 

(2) 

Coupling capacitor 

( 3 ) 

Measuring impedance 

(4) 

Filter and Amplifier 

( 5 ) 

Digital storage oscilloscope 

(6) 

Personal computer 


7.6 Results and Discussion 

Since tiK features of GLDHM have shown a good classification accuracy compared to the 
other tcclmuiues, as shown in the previous chapters, these have been used m this study for 
three different values of power fi-equency cycles which are 2, 4 and S cycles. The 
classification accuracy results are shown in Fig. 7.2. Thirty patterns were used to represent 
each partial discharge class. Fifteen patterns were used for trainmg of the minimum 
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distance classifier and the complete set of patterns were used for testing. In case of 2 
cycles, the classification accuracy of all the texture features was found to be 100%. The 
classification accuracy of the polarity factor (features 6 & 17, not texture feature) were 
86.66% & 90% in the positive and negative half cycles, respectively and its classification 
accuracy increased with increasing number of cycles. However, the classification accuracy 
of the texture features slightly decreased for the cases of 4 and 8 cycles. The features 
related to the positive half cycles are found better than those related to the negative half 
cycle. From Fig. 7.2, one can notice the following: 

(i) In the case of 2 cycles, all the texture features related to the horizontal and 
vertical directions have the same classification accuracy when used in the 
positive and negative half cycles. 

(ii) In the case of 4 cycles, the classification accuracy of the contrast (feature 
number 14 in the horizontal direction and number 20 in the vertical direction, 
both in the negative half cycles) has the worst classification accuracy. However, 
it has the best classification accuracy in the positive half cycle (feature 3 in the 
horizontal direction and feature 9 m the vertical direction). In the positive half 
cycle, feature number 1 (mean) has almost the same classification accuracy of 
feature number 3 (contrast) in the horizontal and vertical directions. 

(iii) Using 8 cycles, the contrast has the worst classification accuracy in the 
horizontal and vertical directions of both the positive and negative half cycles 
(features 3, 9, 14 and 20). 

(iv) Also, for the case of 8 cycles, the classification accuracy of the five texture 
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features of GLDHM have the same relative classification accuracy when used 
for the horizontal or vertical direction of each half cycle individually (features 
1, 2, 3, 4, 5 and 7, 8, 9, 10, 11 for the positive half cycle as well as features 12, 
13, 14, 15, 16 and 18, 19, 20, 21, 22 for the negative half cycle). The same can 
be observed also in the case of 4 cycles in the negative half cycle only. 


0 1 



' 1 '1 “r"'"!”” y - j- 


n \ I ^ I I ^ — 1 


1 3 5 7 9 11 13 15 17 19 21 

Features number 



Fig. 7.2: Classification accuracy of GLDHM features for cable sample 


Using two features at a time, the over all accuracy has improved for the different number 
of cycles. It is important to investigate the direction of both the features in each 
combination to determine which direction has more classification accuracy. The expected 
combinations will be from horizontal-horizontal features, vertical-vertical features and 
mixed features i.e. horizontal- vertical or vertical horizontal. Fig. 7.3 shows the directions 
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of the features in all the possible combinations for classification accuracy greater than 
certain values. These values are 100%, 95%, 90%, 85%, 80%. Tie figure is divided into 
three portions for the cases of 2, 4, and 8 cycles. The number of combinations in each 
direction was expressed as a percentage of the total number of possible combinations. In 
case of 2 cycles, the classification accuracy of all the combinations are 100% except the 
combination 7-17 which has 96.66% classification accuracy. In the case of 4 cycles, there 
is no combination having 100% classification accuracy. The maximum classification 
accuracy was 98.3.3% for about 15% from the combinations. For the case of 8 cycles, the 
maximum classification accuracy was 100% for around 30% fi:om the combinations. From 
Fig. 7.3, the following observations can also be made: 


2 cycles 


4 cycles 


8 cycles 



Fig. 7.3: Relation between the classification accuracy of the feature combmations 
of GLDHM and the direction of these features 
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(i) For any number of power frequency cycles, the combinations from horizontal - 
horizontal directions, offering a certain classification accuracy, are greater than 
the combinations from vertical-vertical direction. 

(ii) The number of mixed combinations, offering any classification accuracy, are 
always greater than the summation of combinations of horizontal-horizontal and 
vertical-vertical directions. 

(iii) In the case ol 2, 4 and 8 cycles, more than 99% combinations have 
classification accuracy greater than 80%. 

7.7 Conclusion 

In this chapter, the GLDHM has been used to distinguish two different partial discharge 
sources generated from a I IkV belted cable sample. The first partial discharge source was 
a combination of the internal discharge in the voids between the insulating paper and the 
surface discharge at the temiinal points of the cable between the core and the lead sheath. 
The second partial discharge was a pure internal discharge after eliminating the surface 
discharge by using a stress cone at both ends of the cable. The features used were able to 
distinguish two different partial discharge sources successfully. The features related to the 
horizontal direction have better discriminating power compared to the features related to 
the vertical direction. However, the relative classification acciiracy of the texture features, 
when applied for the cable sample, are different than their relative classification accuracy 
when applied to the six partial discharge sources in chapter 4. This means that the relative 
classification accuracy of the texture features depends upon the partial discharge sources 
used to generate these features. 
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Chapter 8 

Partial Discharge Patterns Classification Using 
the Principal Component Transformation 


8.1 Introduction 

In chapters 4 to 7, the four texture analysis methods were used to generate various features 
to identify different partial discharge sources. The main aim of the work carried out in 
these chapters was to identify the features which have the most discriminative power for 
the accurate classification accuracy of different partial discharge sources. From the results 
of the minimum distance classifier in chapter 4, it was found that the discriminative power 
of any feature of the four texture analysis algorithms depends upon the number of power 
frequency cycles used to represent the partial discharge sources. A similar result has been 
achieved by using the transformed divergence analysis m chapter 5. In chapter 6, the multi 
layer feed forward artificial neural network was used. However, it was not able to improve 
the classification accuracy of the texture features. In chapter 7, the features of GLDHM 
were used to classify two different partial discharge sources in cable sample by using 
minimum distance classifier. It was observed, in the previous chapters, that the relative 
classification accuracy of the texture features also depend upon the number of power 
frequency cycles. On the other hand, the relative classification accuracy of the features 
differed from the results obtained from the minimum distance classifier in chapter 4 for the 
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six partial discharge sources. In view of these observations, it is dfficult to select the best 
feature or the best combination of features for any texture analysis by the direct features 
selection techniques. 

The direct feature selection is done by eliminating the features which contribute little to the 
separability of the partial discharge sources as described in the previous chapters. 
Alternatively, one can use a combination of all the features by transforming the original set 
into a new set of features in which separability is higher in a subset of the transformed 
features than in any subset of the original data. The new set of features could be generated 
from the original set by using the principal component transformation [Jolliffe, 1986]. The 
principal component transformation maps the original set of features into a new and 
uncorrelated set of features. Moreover, it produces a space in which the data has maYimnm 
variance along its first axis, the next largest variance along a second mutually orthogonal 
axis and so on. The later principal components would be expected, in general, to show a 
smaller variance. These could be considered, therefore, to contribute little to the 
separability and could be ignored, thereby reducing the essential dimensionality of the 
classification space and thus improving classification speed. This is only of value, 
however, if there is correlation between the features in the original feature space and some 
features which have hi^er variance compared to the other features. The principal 
component transformation has already been applied to the partial discharge problem but 
using statistical parameters like skewness, kourtosis, etc. describing the shapes of (qav- 9 )) 
(qmax-cp), (n-«p) distributions [Kiivada, 1995]. 

The main aim of the work carried out in this chapter is to use the principal component 
transformation technique for mapping the original set of features to a reduced set in order 
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to minimize the classification time. The measurements of the six partial discharge sources 
discussed in chapter 3 and the two types of partial discharge in the cable sample described 
in chapter 7 have been used in the present work, for the sake of comparison. Investigate 
have been carried out to establish the ability of the principal component transfonnation to 
solve the problems with direct feature selection technique mentioned earlier. 

8.2 Principal Component Transformation 

In N dimensional space, where N is the number of features used for classification, consider 
that each class i.e. partial discharge source has been represented by Mpattems. In this case, 
each pattern can be described by a vector of N components. The mean value of each 
component can be calculated as following: 

mean^ (/) = — V x(f , y) for i = 1 , ,N (8.1) 

While the mean values vector is useful to define the average value of the given data, it is 
also very important to investigate the scatter or spread of the data in the 7/^ dimensional 
space. The covariance matrix can be used for this purpose. Its elements oi (i,k) are 
calculated as following: 

1 ^ 

cx.iUk) = 

fori = l,...,Nand k = i,....,N (8.2) 

The covariance matrix provides an important mathematical concept in the analysis of data 
in multidimensional space. If there is a correlation between the response of partial 
discharge sources using certain features, the corresponding off-diagonal element in the 
covariance matrix will be large as compared to the diagonal elements. On the other hand, if 
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there is little correlation, the off-diagonal elements will be close to zero. This behavior can 
be described in terms of the correlation matrix R whose elements are related to the 
covariance matrix by, 

^ihj) = fori=l, ,Nandk = i,...,N (8.3) 

It is clear that the correlation matrix is more suitable to find the correlation between the 
different features than the covariance matrix. Since the correlation matrix is normalized by 
the standard deviation of each feature, it is independent of the measuring units. It is 
fundamental to the development of the principal components transformation to ask whether 
there is a new co-ordinate system in which the data could be represented without 
correlation. In other words, the covariance matrix in the new co-ordinate system should be 
diagonal. Tlris means that, if the vectors x describing the patterns in the original feature 
space are represented as vectors y in the new co-ordinate system, it is desirable to find a 
linear transformation G of the original co-ordinates, such that y = G x, subject to the 
constraint that the covariance matrix of the data in y space is diagonal. In y space the 
covariance matrix is, by definition, 

1 m 

CTyihk) = '^(y{i,j)-meany(i))(y(k,j)-meany(k))‘ (8.4) 

w-l^.i 

and mearty is the mean vector expressed in terms of they co-ordinate, is defined as, 

meanJi) = — V yihj) = —^GxxiiyJ) = Gx—^x(ij) :=^Gxmean^{iJ) 

m ^ w /=! 

for/=l (8.5) 

where mearix is the data mean in x space. Therefore, 

= -L-f^(GxxiU)-G^mean,{i)){GxxikJ^^ ( 8 - 6 ) 
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which can be written as 

1 ” 

a^{i,k) = Gx -Y,(^0’j)-^ean^(j))(x(k,j)-meanXk)y xG' (8.7) 

m — i J-] 

or 

cr^ = Gxa^xG‘ ( 8 . 8 ) 

Since av must be diagonal, G can be recognized as the transposed matrix of eigen vectors 
of oi , provided that G is an orthogonal matrix. As a result, cTy can then be identified as the 
diagonal matrix of eigen values of oi i.e. will be in the form of: 


Xj 0000 0 

0 0 0 0 0 

CTy = 0 0 X 3 0 0 

0 

0 0 0 0 0 Xf^ 


Its elements will be the variance of the data in the respective transfonned co-ordinates. It is 

arranged such that Xj> X 2 > Xs> >Aa.. Hence the data exhibits maximum variance in 

the next largest variance iny 2 and so on, with minimuih variance myN- 


8.3 Eigen Value Calculation 

To calculate the eigen values X of the covariance matrix A, the following equation 
be solved 


Ajc — Ajc 


(8.9) 


Because of the presence of the unknown vector x on 


both sides of the above equation, the 
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solution methods for eigen value problem will be essentially iterative in nature. There are 
three different approaches to solve the eigen value problem [Griffiths, 1991], The first 
method is to find the roots of the characteristic polynoimal. The second one is the 
transformation method in which tire matrix A is transformed into a new matrix B which has 
the same eigen values as A. However, the eigen values of the transformed matrix are easier 
to compute tlian the eigen values of the original matrix. This transformation can be done if 
the equation (8.9) is post multiplied by a rotation matrix and pre multiplied by the inverse 
of the rotation matrix to diagonalize the original matrix A. A third method is the vector 
iterative. In this method a guess is made for x on the left hand side of the equation (8.9), 
the product Ax is fonned, and compared with the right hand side of the same equation. The 
guess is then iteratively adjusted until agreement is reached. This method is sometimes also 
called as power method. 'Die major advantage of this method is that it gives the eigen value 
and its eigen vector directly at a time. This method always converge to the largest absolute 
eigen value and it.s correspemding eigen vector. To find the second largest eigen value of 
the system, the first largest value has to be removed from the system of equation once it 
has been computed, 'fhis step is called as deflation. Since the covariance matrix is 
symmetrical, its eigen vectors obey the orthogonality rules. This means that 

xi^xi=l andxi^X2=0 (8.10) 

where x;, x? are the eigen vectors of the matrix A corresponding to the 1 and 2 eigen 
values. This property could be used to established a modified matrix B such that 

B=A-Ax}x/ (8-11) 

Multiplying this equation by any eigen vector x,- to give 

Bxi=AXi -Xxjx/xi 


( 8 . 12 ) 
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When i equals I , the last equation will be zero. Thus the first eigen value of B is zero, and 
all the other eigen values of B are the same as those of A. Once A has deflated to B, the 
largest eigen value of B can be found and so on. Since the first few principal components 
are only required in this study, the third method has been used to determine the most 
important principal components. 

8.4 Experimental Results 

After obtaining data from the six t>pes of PD sources m the laboratory, for each PD source, 
30 patterns were generated. Out of these, 15 patterns were used for training of the 
minimum distance classifier while all the patterns were used for testing the classification 
accuracy after tfie training. The experimental results are described below. 

8.4.1 Variance and correlation analysis 

Before starting the principal component transformation, the variance and the correlation 
between the feature.s were checked to determine whether the principal components exist. 
Figs. 8-1 to 8.4 show the variance of all the features of the four techniques GLDHM, 
SGLDM, GLRLM and PSM. It is clear from these fi^es that, with each technique, there 
are some features which have higher variance as compared to the other features. Such 
features will represent principal components as these will .have higher discriminating 
power. It is interesting to note that these features will represent separate principal 
components only if there is no correlation between them. Otherwise, the principal 
components will be a combination of these features and each one of them will contribute to 
the principal components according to its relative variance. In that case, the number of 
principal components will reduce. 
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Fig. 8.1 : Variance of GLDHM features 
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Fig. 8.2 : Variance of SGLDM features 
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Fig. 8.3 : Variance of GLRLM features 
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Fig. 8.4 : Variance of PSM features 



Fig. 8.5 ; Correlation between the GLDHM features 


Investigating the correlation between the GLDHM features, as shown in Fig. 8.5, it was 
found that there are high correlation between the features which describe the positive half 
cycles. Also there are high correlation between the features which describe the negative 
half cycles. On the other hand, there is no correlation between the features which describe 
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both positive and negative half cycles. Therefore, the expected number of principal 
components will be smaller as compared to the original number of features. In case of 
GLDHM and SGLDM, there are some features which give the maximum variance and do 
not depend on the number of cycles like features 12, 14, 17, 18 and 20 for GLDHM and 
17, 20, 24 and 27 for SGLDM. For GLRLM such features are not found except the feature 
number 12. For PSM such features are not found at all. 

8.4.2 Partial discharge data transformation 

After verifying the possible existence of principal components in the features of each 
technique, the principal component transformation were carried out for the features of the 
four techniques individually. Since the discriminating power of PSM was poor compared 
to the other techniques, as shown in the previous chapters, it was felt interesting to 
investigate the effect of using the principal component analysis on the data generated from 
this technique. The distribution of the six partial discharge sources by using the first two 
principal components of GLDHM and PSM are shown in Figs. 8.6 and 8.7. It can be 
observed from these figures that the six classes are clearly separable with the two 
techniques, which imply that the classification accuracy by using PSM will improve with 
using the principal component transformation. A similar separation between the different 
partial discharge sources has been achieved using the other techniques. The high 
separability between the classes has to be reflected on the classification accuracy for the 


four techniques. 
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Fig. 8.6 : Distribution of partial discharge patterns using the first two principal 

components of GLDHM features 



Fig. 8.7 : Distribution of partial discharge patterns using the first two principal 

components of PSM features 
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8.4.3 Classification results 

Perhaps the most obvious criterion for choosing a number of principal components (say m) 
is to set a desired percentage of total variation which the selected principal components 
should contribute, say 80% or 90%[ Jolliffe, 1986]. The required number of principal 
components is then the smallest number with which the chosen criterion is met. The 
definition of percentage of variation accoimted for the first m principal components is 

100 X 22, 



;=:I 


where p is the total number of tihe principal components. 

Fig. 8.8 shows the eigen values corresponding to the first ten principal components of the 
features calculated fi-om GLDHM. This figure indicates that, for any number of measured 
cycles, the first three principal components represent more than 90% variation in the given 
data. This means that only the first three principal components could be used for partial 
discharge classification since they have the maximum variation. This conclusion is clear 
from Fig. 8.9, which indicate that by using only the first principal component for the 
framing of the minimum distance classifier, the classification accuracy achieved was 
around 65%. Increasing the number of principal components. used for the training of the 
classifier to two and three components, the classification accuracy increased from 65% to 
more than 88% and 95%, respectively. Similar results were obtained by using the other 
techniques like SGLDM, GLRLM and PSM, as shown in Figs. 8.10 to 8.15. However, the 
classification accuracy of the principal components of the features of PSM is less than the 
classification accuracy of the principal components of the other techmques. 
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Fig. 8.8 : Eigen values of the first ten principal components of GLDHM 
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Fig. 8.10: Eigen values of the first ten principal components of SGLDM 
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Fig. 8.12 : Eigen values of the first ten principal components of GLRLM 



Fig. 8.13 ; Classification accuracy of GLRLM principal components 
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Fig. 8.14 : Eigen values of the first ten principal components of PSM 
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Fig. 8.15: Classification accuracy of PSM principal components 


It is interesting to investigate the relationship between the first three principal components 
and the original set of features. The relation between the GLDHM features and their fiirst 
three principal components is shown in Fig. 8.16. The summation of all squared 
contributions to any component was normalized to unity. By observing the correlation 
between the features of GLDHM which gave maximum variation, it was found that the 
features 12, 14, 18, 20 have almost the same variance. Also it was found that the 



PD patterns classification using the principal component transformation 


185 


correlation between each pair iSrom these features was around 97%. On the other hand, the 
correlation between feature 17, which has the maximum variance, and the features 12, 14, 
18, 20 were less than 54%. Therefore, the contribution of the features 12, 14, 18, 20 to the 
first principal component were almost equal while the third principal component mainly 
consisted of the feature number 17 as shown in Fig. 8.16. Also it is found that the first 
component mainly consisted of the features 12, 14, 17, 18, 20, 21 and the second consisted 
of the features 1, 3, 4, 7, 9, 10 while the third consisted of the features 3, 9, 12, 14, 17, 18, 
20. 
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Fig. 8.16a : Contributions of GLDHM features to the first principal component 



Fig. 8.16b : Contributions of GLDHM features to the second principal component 
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Fig. 8.16c : Contributions of GLDHM features to the third principal component 

Fig. 8.16: The relation between the GLDHM features and the first thee principal 

components. 


For SGLDM, it was found that the correlation between feature 17 and the features 20, 24, 
27 were 81%, 99% and 99%, respectively, whereas the correlation between the features 
20, 24, 27 were more than 98%. Fig. 8.17 shows the relation between the SGLDM features 
and their first three principal components. From this figure, it can be observed that the first 
principal component mainly consists of the features 17, 20, 24, 27 and the second principal 
component mainly consists of the features 3, 4, 6, 10, 11, 13, whereas the third principal 
component consists of the features 7, 18, 21, 25. A critical observation of Figs 8.16 & 8.17 
reveals that the features contributing maximum to the first principal component, both in the 
case of GLDHM and SGLDM, belong to the negative half cycle whereas those 
contributing maYiTrmm to the second principal component belong to the positive half cycle. 
In both the cases, contributions of features belonging to the horizontal direction are equal 
to or slightly better as compared to those belonging to the vertical direction. 

For GLRLM, similar results have been obtained. The first component consists of the 
features 11, 12, 20 and the second consists of the features 6, 7, 8, 9 whereas the third 
component consists of the features 10, 12, 20. 
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Fig. 8.17a : The contributions of SGLDM features to the first principal component 



Fig. 8.17b ; The contributions of SGLDM features to the second principal component 



Fig. 8.17c : The contributions of SGLDM features to the third principal component 

Fig. 8.17: The relation between the SGLDM features and the first thee principal 

components. 
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For PSM, the situation was different where the first component consisted of the first half of 
the features with almost equal contribution and nearly zero contribution from the second 
half of the features, while the second component was just opposite the first which mainly 
consisted of the second half of the features. The third one was mixed from the whole set of 
features. 

From the above observations, it is clear that, for three of the four algorithms utilized in this 
work, the first principal component depends upon the features calculated from scanning the 
partial discharge patterns of the negative half cycle, the second is based upon the features 
calculated from scanning the partial discharge patterns of the positive half cycle and the 
third is mixed from the positive and negative half cycles. The features resulted from the 
horizontal direction is slightly better than the features resulted from the vertical direction. 

8.5 Principal Component Transformation to the Cable Measurements 
In this section, the principal component analysis has been applied to the measurements of 
the practical case of the cable sample described in chapter 7. The eigen values of the first 
six principal components of only GLDHM are shown in Fig. 8.18. From this figure, one 
can observe that the first principal component account for more than 95% of the total 
variation in the given measurements. By using the first principal component for the 
training of the mim'rrmTn distance classifier, the classification accuracy were obtained as 
100%, 93.33% and 96.66% for the cases of 2, 4 and 8 cycles, respectively as shown in Fig. 
8.19. By using only two principal components the classification in the case of 4 cycles has 
improved to 95% while the classification accuracy for the other two cases of 2 and 8 cycles 
remained the same. Thus, the first two principal components are sufficient to discriminate 
the two sources of partial discharge in the cable. 



PD panems classification using the principal component transformation 


189 



Fig. 8.18 : Eigen values of the first six principal components of GLDHM 
features used for the cable measurements 



Fig. 8.19 : Classification accuracy of GLDBDVI principal components 
used for the cable measurements 


Figs. 8.20a to 8.20c show the relative contribution of the various features of GLDBM 
belonging to the horizontal and vertical directions of the positive and negative half cycle to 
the first three principal components. From figures 8.20a to 8.20c, the following 
observations can be made: 

(i) The first and the second principal components are equally divided between 
the positive and negative half cycles. 
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(ii) In each half cycle, the contribution of the features calculated from the 
horizontal and vertical directions are exactly the same. For example the 
features 1 & 7, 3 & 9 in the positive half cycle and the features 12 & 1 8, 14 
& 20 in the negative half cycle have equal contributions. 

(iii) The third principal component mainly consists of feature number 17. 

The contribution of the GLDHM features to the first two principal components in this case 
is different than the contributions of these features for the case of the original six partial 
discharge sources shown in Fig. 8.16. The main reason behind this disagreement is the 
presence of the glow corona in the case of six partial discharge sources. Whereas, the glow 
corona has partial discharge pulses only in the positive half cycles, the negative half cycle 
will be able to classify this source very easily from the other sources. Due to this, the first 
principal component basically consisted of features from the negative half cycle in the case 
of six partial discharge sources. 
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Fig. 8.20a : The contributions to the first principal component 
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Fig. 8.20b : The contributions to the second principal component 
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Fig. 8.20c : The contributions to the third principal component 

Fig. 8.20: The relation between the GLDHM features and the first thee 
principal components for the cable measurements 


As discussed in the preceding sections, the principal component analysis has been done for 
the case of the six partial discharge sources described in chapter 3 and the two partial 
discharge sources for the cable sample described in chapter 7. In this section, the principal 
component analysis for all the eight partial discharge sources has been done together. The 
different sources of partial discharges are now being named as class 1 to class 8 which 
represent the glow corona, streamer corona, surface discharge, internal discharge, single 
protrusion, multi protrusions, internal-surface discharge in cable and the internal discharge 
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in cable, respectively. The largest ten eigen values are shorn in Fig. 8.21. The first three 
principal components cover more than 90% of the over all variation of the data. When the 
minimum distance classifier was trained for the eight partial discharge sources, the 
classification accuracy were found as 54%, 90% and 97.5%, respectively, with one, two 
and three principal components used as shown in Fig. 8.22 for 2, 4 and 8 cycles. Hence, the 
first three principal components are sufiScient to achieve a considerable classification 
accuracy. The contribution of the features to the first three principal components in the 
case of eight partial discharge sources are almost similar to the contribution of the features 
to the first three principal component in the case of the original six partial discharge 
sources as evident from Figs. 8.23 and 8.16. Fig. 8.24 shows the distribution of the partial 
discharge patterns of the eight partial discharge sources using the first two principal 
components of GLDHM features. It is clear from this figure that the high classification 
accuracy resulted from the good separation between the different partial discharge sources 
by using the principal component transformation. 
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Fig- 8.21 : Eigen values of the first ten principal components of GLDHM 
for the eight partial discharge sources 
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Fig. 8.22 : Classification accuracy of GLDHM principal components used 
for the eight partial discharge sources 



Fig. 8.23 a : The contributions of GLDHM features to the first principal component 

for the eight partial discharge sources 



Fig. 8.23b ; The contributions of GLDHM features to the second principal component 

for the eight partial discharge sources 
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Fig. 8.23c : The contributions of GLDHM features to the third principal component 

for the eight partial discharge sources 
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Fig. 8.24 : Distribution of partial discharge patterns of the eight partial discharge 
sources using the first two principal component of GLDHM features 


8.6 Conclusion 

In this chapter, the principal components transformation of the features based on four 
texture analysis techniques have been investigated to determine its ability to classify 
different partial discharge sources. These four texture analysis techniques are the gray level 
difference histogram, spatial gray level dependence method, gray level run length method 
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and the power spectrum method. It was found that by using the first three principal 
components, for any of these techniques, more than 95% classification accuracy can be 
achieved. Generally, the first principal component was obtained firom the features which 
describe the negative half cycle, the second from the positive half cycles and the third fiom 
the whole set of features. The features resulting fi-om the horizontal direction exhibit the 
same or slightly better performance than the features resulting fi-om the vertical direction. 
The first three principal components account for more than 95% of the total variation in the 
data. However, the increase in the classification accuracy due to increase in the number of 
principal components fiom two to three is not significant. Therefore, in most of the cases 
two principal components are sufficient to achieve a considerable classification accuracy 
for any number of power frequency cycles used to construct the partial discharge. For each 
technique, changing the number of power frequency cycles used to construct the partial 
discharge patterns has negligible effect on the classification accuracy of the principal 
components. The application of principal component transformation is very useful in 
reducing the number of features which can be used to characterize any partial discharge 
source and hence to minimize the classification time. These observations are found to be 
valid for the six partial discharge sources created in the laboratory and also for the partial 
discharges in the case of the practical cable sample. 
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Chapter 9 
Conclusions 


9.1 General 


Partial discharge is an electric discharge that » not completely bridge the insulation. It 


gives rise to electric pulses having a magnitude {q) and a phase position with respect to 


the applied voltage waveform. It has been recognized that the breakdown of insulation of 
an electrical equipment is often occurred by to the occurrence of partial discharge within or 
on the surface of the insulation. Therefore, if partial discharge is found in any insulation 
system, it is important to identify its source, in other words, to classify the unknown partial 
discharge source. The most important step in the classification process is to get the exact 
finger prints which can represent the different partial discharge sources successfully. 
During the last decade, partial discharge finger prints have been formed, commonly, by 
phase resolved pulse height analysis using the distributions, where n is the 

repetition rate of partial discharge pulses. Conventionally, classification of partial 
discharge sources has been done with the help of the shape of these distributions. However, 
these distributions suffer from the averaging effect and do not take the memory 
propagation between the partial discharge pulses into consideration. 

Texture analysis algorithms have been successfully applied in the field of pattern 
recognition. Since the texture features contain information about the spatial distribution of 
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spectral variation, it is applied especially in tbe application of image processing. Texture 
analysis algorithms have been used to study the gray level variation of any image in 
different directions (horizontal, vertical, left diagonal and right diagonal). These algorithms 
could also be used to study the variation of the partial discharge pulses during the 
measurements. In this thesis the concept of texture analysis algorithms has been applied 
replacing the gray level values used in image analysis with the partial discharge pulses 
magnitude so that the texture features could give an over view of every single partial 
discharge pulse of the whole measurements. Investigations in this thesis were conducted in 
the horizontal and vertical directions only to study the relation between each adjacent 
pulses in the same cycle as well as the relation between a pulse and the other pulses in the 
same phase angle in different cycles. An attempt was made to understand some of the 
issues in obtaining reliable features jfrom texture analysis algorithms and evaluate its ability 
for classifying different partial discharge sources using automated personal computer 
system. Four texture analysis algorithms have been studied in this work. Three of them 
include spatial gray level dependence method (SGLDM), gray level difference histogram 
method (GLDHM) and gray level run length method (GLRLM) in the spatial domain. 
Whereas the fourth algorithm was m frequency domain namely the power spectrum 
method (PSM). The work carried out in the present thesis broadly cover the following. 


. The classification accuracy of the different texture analysis techniques have been 
calculated by using the minimum distance classifier and the discriminating power of 
the texture features have been compared with the conventional (q-(p-n) method . 

• The relative classification accuracy of every feature in each texture analysis algorithms 
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have been calculated by using the minimum distance classifier to determine the 
maximum classification accuracy for identifying the best feature. 

• Dissimilarity between different partial discharge sources was measured using the 
transformed divergence analysis as a direct feature selection technique to reduce the 
classification time by eliminating the features which contribute little to the separation 
between the partial discharge sources. 

• The discriminating power of different features were also calculated by using an 
artificial neural network (ANN) as a non parametric classifier to overcome the problem 
of partial discharge sources which did not have normal distribution in their feature 
space. 

• The principal component analysis was also used, as an indirect feature reduction 
method, to determine the number of principal components which could be used at a 
time to achieve reasonable classification accuracy. 

Tlie above methods were applied for classification of six partial discharge sources created 

in the laboratory and also for the partial discharge measured from a practical cable sample. 

9.2 Main Conclusions 

From the research work carried out in this thesis the following main conclusions can be 

drawn: 

1. Texture analysis algorithms can be used to generate different features which are 
capable of distinguishing between different partial discharge sources. 

2. The classification accuracy of features generated from SGLDM and GLDHM 
algorithms are much better than the features generated fi-om the phase resolved pulse 
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height distributions especially for classifying unlcnown partial discharge patterns not 
used for training of the classifier. 

3. PSM is coinputatioiially intensive and it uses the complex number rather than the real 
number. Also it nee<is more computer memory to save the original partial discharge as 
well as the transforcoed partial discharge patterns. 

4 . Most of the features of the GLDHM and SGLDM have the same discrimmating power. 

5. GLDHM has a classification accuracy at par with the other algorithms and is 
computationally faster. Therefore, it is recommended for partial discharge source 
classification. 

6. Using the minimum distance classifier, for each of the algorithms, two features at a 
time are sufficient to achieve a considerable classification accuracy. 

7. h'or each of the algorithms, employing average transformed divergence, two features at 
a time are also sufficient to achieve a complete separation between the different partial 
di.scharge sources. 

8. However, the discriminating power of the features using the mioimum distance 
classifier and the separation between the classes using the average transformed 
divergence depend on the number of cycles to constmct the patterns which further 
generate the features. It also depends upon the partial discharge sources considered for 
classification. 

'Hie artificial neural network (ANN) is relatively time consuming process during the 
learning phase. Using a single feature for training, the classification accuracy of the 
best features were less than the classification accuracy of the best features resulted 
from the minimum distance classifier for all the four techniques in this work. A similar 
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result was achieved when two features were used at a time for the network training. 
The over lapping between the different partial discharge classes affects the ability of 
the ANN for classification. 

1 0. For the principal component transformation, two principal components were found to 
be sufficient to achieve a considerable classification accuracy ia most of the cases. 

1 1 . In each half cycle, the contribution of the features related to the horizontal and vertical 
directions are almost the same for the first and second principal components. 

12. The classification accuracy of the principal components are independent of the number 
of cycles used to construct the partial discharge patterns. 

13. The application of the principal component transformation reduces the number of 
features needed for partial discharge classification and hence the classification time. 
These observations were found equally valid for the six partial discharge sources 
created in the laboratory and the two partial discharge in the practical cable sample. 
Therefore, the principal component transformation is recommended as feature selection 
technique for the partial discharge source classification. 

9.3 Scope for Further Work 

In this thesis, the features calculated from the texture analysis algorithms have been used 
successfully for the classification of partial discharge sources. These features exhibit a 
good performance compared to the conventional pulse height phase resolved method. The 
features calculated from the conventional method hayebeen reported to investigate the 
effect of aging on the shape of the q-cp-n distributions. It hastbeen also reported that the 
shape of these distributions changes with aging of the insulation. However, it is felt that 
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extent of the insulation damage by observing these features can be studied in detail for 
monitoring the condition of a particular insulation. The features of texture analysis 
algorithms have proven to be superior compared to the conventional q-q)-n distribution. By 
observing the texture features, the assessment of the electrical insulation condition can be 
achieved. 

In the present thesis, only multilayer perceptron model of ANN has been tried out. Some of 
the recent ANN models and a combination of fuzzy logic and ANN can also be tried out 
for the partial discharge classification. 
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Appendix 1: Histogram of the Features of the Texture Analysis Algorithms 


To represent the partial discharge sources in the feature space, in this study, thirty different 
patterns have been generated for each source. Since the minimum distance classifier and 
the transformed divergence are based on the assumption that the partial discharge sources 
are normally distributed, it is very important to investigate the scatter of the partial 
discharge patterns in the feature space to check the validity of the above assumption. The 
scatter of the partial discharge sources in the feature space depends upon the features used. 
The scatter of the six partial discharge sources described in chapter-3 by using all the 
features of the four texture analysis algorithms are given in this appendix. For the SGLDM, 
GLDHM, GLRLM and PSM the features are shown in two colunms, the Idrst column is for 
the horizontal and vertical directions of the positive half cycle while the second column is 
for the features of the horizontal and vertical directions of the negative half cycle. With 
respect to the q-cp-n distributions, the histograms are shown in three columns, the first one 
for the features derived from (qav-cp) distribution, the second for the features derived from 
(qmax-cp) distribution while the last one for the features derived from (n-cp) distribution. 
Generally, in any histograms, the horizontal direction represent the expected values of the 
sources. While the vertical direction represent the repetition of these values. In this study, 
each texture analysis technique has features with different scaling. Therefore, for each 
feature, the minimum and maximum values have been determined to find the range of 
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variation. This range 'was then divided into 10 equal intervals and the repetition of the 
partial discharge patterns of each class in these mtervals were determined. Therefore, in the 
following histograms, the horizontal axis represents the order of the intervals instead of its 
actual values. 

Histograms of the GLDHM features 

Positive half cycle Negative half cycle 

Horizontal direction 
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Histograms of the PSM features 
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